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JL reface 


In preparing the sixth edition we have kept in mind the two purposes 
this book has served during the past thirty years. Prior editions have been 
used extensively both as texts for introductory courses in statistics and as 
reference sources of statistical techniques helpful to research workers in 
the interpretation of their data. 

As a text, the book contains ample material for a course extending 
throughout the academic year. For a one-term course, a suggested list 
of topics is given on the page preceding the Table of Contents. As in 
past editions, the mathematical level required involves little more than 
elementary algebra. Dependence on mathematical symbols has been 
kept to a minimum. We realize, however, that it is hard for the reader to 
use a formula with full confidence until he has been given proof of the 
formula or its derivation. Consequently, we have tried to help the reader’s 
understanding of important formulas either by giving an algebraic proof 
where this is feasible or by explaining on common-sense grounds the roles 
played by different parts of the formula. 

This edition retains also one of the characteristic features of the 
book — the extensive use of experimental sampling to familiarize the reader 
with the basic sampling distributions that underlie modern statistical 
practice. Indeed, with the advent of electronic computers, experimental 
sampling in its own right has become much more widely recognized as a 
research weapon for solving problems beyond the current skills of the 
mathematician. 

Some changes h&ve been made iiUhe structure of the chapters, mainly 
at the suggestion of teachers who have used the book as a text. The former 
chapter 8 (Large Sample Methods) has disappeared, the retained material 
being placed in earlier chapters. The new chapter 8 opens with an intro- 
duction to probability, followed by the binomial and Poisson distributions 
(formerly in chapter 16). The discussion of multiple regression (chapter 
13) now precedes that of covariance and multiple covariance (chapter 14), 



vi Prefect 

Chapter 16 contains t$o related topics, the analysis of two-way classifica- 
tions with unequal numbers of observations in the sub-classes and the 
analysis of proportions in two-way classifications. The first of these 
topics was formerly at the end of a long chapter on factorial arrangements ; 
the second topic is new in this edition. This change seemed advisable for 
two reasons. During the past twenty years there has been a marked in- 
crease in observational studies ip the social sciences, in medicine and public 
health, and in operations research. In their analyses, these studies often 
involve the handling of multiple classifications which present complexities 
appropriate to the later sections of the book. 

Finally, in response to almost unanimous requests, the statistical 
tables in the book have been placed in an Appendix. 

A number of topics appear for the first time in this edition. As in 
past editions, the selection of topics was based on our judgment as to 
those likely to be most useful. In addition to the new material on the 
analysis of proportions in chapter 16, other new topics are as follows : 

• The analysis of data recorded in scales having only a small number 
of distinct values (section 5.8); 

• In linear regression, the prediction of the independent variable 
X from the dependent variable F, sometimes called linear calibration 
(section 6.14); 

• Linear regression when X is subject to error (section 6. 1 7) ; 

• The comparison of two correlated estimates of variance (section 
7.12); 

• An introduction to probability (section 8.2); 

• The analysis of - proportions in ordered classifications (section 
9.10); 

• Testing a linear trend in proportions (section 9.1 1); 

• The anal) sis of a set of 2 x 2 contingency tables (section 9.14); 

• More extensive discussion of the effects of failures in the assump- 
tions of the analysis of variance and of remedial measures (sections 1 1 .10— 
11,13); 

® Recent work on the selection of variates for prediction in multiple 
regression (section 13.13); 

• The discriminant function (sections 13.14, 13.15); 

• The general method of fitting non-linear regression equations and 
its application to asymptotic regression (sections 15.7-15.8). 

Where considerations of space permitted only a brief introduction 
to the topic, references were given to more complete accounts. 

Most of the numerical illustrations continue to be from biological 
investigations. In adding new material, both in the text and in the exam- 
ples to he worked b> the student, we have made efforts to broaden the 



range of fields represented by data. One of the most exhilarating features 
of statistical techniques is the extent to which they are found to apply in 
widely different fields of investigation. 

High-speed electronic computers are rapidly becoming available as 
a routine resource in centers in which a substantial amount of data are 
analyzed. Flexible standard programs remove the drudgery of computa- 
tion. They give the investigator vastly increased power to fit a variety of 
mathematical models to his data; to look at the data from different points 
of view ; and to obtain many subsidiary results that aid the interpretation. 
In several universities their use in the teaching of introductory courses in 
statistics is being tried, and this use is sure to increase. 

We believe, however, that in the future it will be just as necessary 
that the investigator learn the standard techniques of analysis and under- 
stand their meaning as it was in the desk machine age. In one respect, 
computers may change the relation of the investigator to his data in an 
unfortunate way. When ’calculations are handed to a programmer who 
translates them into the language understood by the computer, the investi- 
gator, on seeing the printed results, may lack the self-assurance to query 
or detect errors that arose because the programmer did not fully under- 
stand what was wanted or because the program had not been correctly de- 
bugged. When data are being programmed it is often wise to include a 
similar example from this or another standard book as a check that the 
desired calculations are being done correctly . 

For their generous permission to reprint tables we are indebted to 
the late Sir Ronald Fisher and his publishers, Oliver and Boyd; to Maxine 
Merrington, Catherine M. Thompson, Joyce N. May, E. Lord, and E. S. 
Pearson, whose work was published in Biometrika; to C. 1. Bliss, E. L. 
Crow, C White, and the late F. Wilcoxon; and to Bernard Ostle and his 
publishers. The Iowa State University Press. Thanks are due also to the 
many investigators who made data available to -us .as illustrative exam- 
ples, and to teachers who gave helpful advice arising from their experience 
in using prior editions as a text. The work of preparing this edition was 
greatly assisted by a contract between the Office of Naval Research, 
Navy Department, and the Department of Statistics, Harvard University. 
Finally, we wish to thank Marianne RlackwelC Nancy Larson, James 
DeGracie and Richard Mensing for typing or proofreading, and especially 
Holly Lasewicz for her help at many stages of the work, including the 
preparation of the Indexes. 

George W. Snedecor 

William G. Cochran 
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★ CHAPTER ONE 


Q 

Kjampling of attributes 


LI— Introduction. The subject matter of the field of statistics has 
been described in various ways. According to one definition, statistics 
deals with techniques for collecting, analyzing, and drawing conclusions 
from data. This description helps to explain why an introduction to sta- 
tistical methods is useful to students who are preparing themselves for a 
career in one of the sciences and to persons working in any branch of 
knowledge in which much quantitative research is carried out. Such re- 
search is largely concerned with gathering and summarizing observations 
or measurements made by planned experiments, by questionnaire surveys, 
by the records of a sample of cases of a particular kind, or by combing 
past published work on some problem. From these summaries, the in- 
vestigator draws conclusions that he hopes will have broad validity. 

The same intellectual activity is involved in much other work of im- 
portance. Samples are extensively used in keeping a continuous watch on 
the output of production lines in industry, in obtaining national and 
regional estimates of crop yields and of business and employment condi- 
tions, in the auditing of financial statements, in checking for the possible 
adulteration of foods, in gauging public opinion and voter preferences; in 
learning how well the public is informed on current issues, and so on. 

Acquaintance with the main ideas in statistical methodology is also 
an appropriate part of a general education. In newspapers, books, tele- 
vision, radio, and speeches we are all continuously exposed to statements 
that draw general conclusions : for instance, that the cost of living rose by 
0.3% in the last month, that the smoking of cigarettes is injurious to health, 
that users of ‘'Blank’s” toothpaste have 23% fewer cavities, that a tele- 
vision program had 18.6 million viewers. When an inference of this kind 
is of interest to us, it is helpful to be able to form our own judgment about 
the truth of the statement. Statistics has no magic formula for doing this 
in all situations, for much remains to be learned about the problem of 
making sound inferences. But the basic ideas in statistics assist us in 
thinking clearly about the problem, provide some guidance about the 
conditions that must be satisfied if sound inferences are to be made, and 
enable us to detect many inferences that have no good logical foundation. 

3 



4 Chapter h Sampling of Attributes 

1.2 — Purpose of this chapter. Since statistics deals with the collection, 
analysis, and interpretation of data, a book on the subject might be ex- 
pected to open with a discussion of methods for collecting data. Instead, 
we shall begin with a simple and common type of data already collected, 
the replies to a question given by a sample of the farmers in a county, and 
discuss the problem of making a statement from this sample that will 
apply to all farmers in the county. We begin with this problem of making 
inferences beyond the data because the type of inference that we are try- 
ing to make governs the way in which the data must be collected. In 
earlier days, and to some extent today also, many workers did not appre- 
ciate this fact. It was a common experience for statisticians to be ap- 
proached with : Here are my results. What do they show? Too often the 
data were incapable of showing anything that would have been of interest 
to an investigator, because the method of collecting the data failed to 
meet the conditions needed for making reliable inferences beyond the 
data. 

In this chapter, some of the principal tools used in statistics for mak- 
ing inferences will be presented by means of simple illustrations. The 
mathematical basis of these tools, which lies in the theory of probability, 
will not be discussed until later. Consequently, do not expect to obtain a 
full understanding of the techniques at this stage, and do not worry if the 
ideas seem at first unfamiliar. Later chapters will give you further study 
of the properties of these techniques and enhance your skill in applying 
them to a broad range of problems. 

1.3 — The twin problems of sampling. A sample consists of a small 
collection from some larger aggregate about which we wish information. 
The sample is examined and the facts about it learned. Based on these 
facts, the problem is to make correct inferences about the aggregate or 
population. It is the sample that we observe, but it is the population which 
we seek to know. 

This would be no problem were it not for ever-present variation. If 
ail individuals were alike, a sample consisting of a single one would give 
complete information about the population. Fortunately, there is end- 
less variety among individuals as well as their environments, A conse- 
quence is that successive samples are usually different. Clearly, the facts 
observed in a sample cannot be taken as facts about the population. Our 
job then is to reach appropriate conclusions about the population despite 
sampling variation. 

But not every sample contains information about the population 
sampled. Suppose the objective of an experimental sampling is to de- 
termine the growth rate in a population of young mice fed a new diet. Ten 
of the animals are put in a cage for the experiment. But the cage gets 
located in a cold draught or in a dark corner. Or an unnoticed infection 
spreads among the mice in the cage. If such things happen, the growth 
rate in the sample may give no worthwhile information about that in the 
population of normal mice. Again, suppose an inter\ iewer in an opinion 
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poll picks only families among his friends whom he thinks it will be pleas- 
ant to visit. His sample may not at all represent the opinions of the popula- 
tion. This brings us to a second problem: to collect the sample in such a 
way that the sought-for information is contained in it. 

So we are confronted with the twin problems of the investigator: to 
design and conduct his sampling so that it shall be representative of the 
population; then, having studied the sample, to make correct inferences 
about the sampled population. 

1 A — A sample of farm facts. Point and Interval estimates. In 1950 
the USDA Division of Cereal and Forage Insect Investigations, cooperat- 
ing with the Iowa Agricultural Experiment Station, conducted an exten- 
sive sampling in Boone County, Iowa, to learn about the interrelation of 
factors affecting control of the European corn borer.* One objective 
of the project was to determine the extent of spraying or dusting for control 
of the insects. To this end a random sample of 100 farmers were inter- 
viewed; 23 of them said they applied the treatment to their corn fields. 
Such are the facts of the sample. 

What inferences can be made about the population of 2,300 Boone 
County farmers? There are two of them. The first is described as a point 
estimate , while the second is called an interval estimate. 

1. The point estimate of the fraction of farmers who sprayed is 23? ' 0 , 
the same as the sample ratio; that is, an estimated 23% of Boone County 
farmers sprayed their corn fields in 1950. This may be looked upon as an 
average of the numbers of farmers per hundred who sprayed. From the 
actual count of sprayers in a single hundred farmers it is inferred that the 
average number of sprayers in all possible samples of 100 is 23. 

This sample-to-popuiation inference is usually taken for granted. 
Most people pass without a thought from the sample fact to this inference 
about the population. Logically, the two concepts are distinct. It is wise 
to examine the procedure of the sampling before attributing to the popu- 
lation the percentage reported in a sample. 

2. An interval estimate of the point is made by use of table 1.4.1. In 
the first part of the table, indicated by 95% in the heading, look across the 
top line to the sample size of 100, then down the left-hand column to the 
number (or frequency) observed, 23 farmers. At the intersection of the 
column and line you will find the figures 1 5 and 32. The meaning is this : 
one may be confident that the true percentage in the sampled population 
lies in the interval from 15% to 32%. This interval estimate is called the 
confidence interval . The nature of our confidence will be explained later. 

In summary: based on a random sample, we said first that our esti- 
mate of the percentage of sprayers in Boone County was 23%, but we gave 
no indication of the amount by which the estimate might be m error. Next 
we asserted confidently that the true percentage was not farther from our 
point estimate, 23%, than 8 percentage points below or 9 above. 

Let us illustrate these concepts in another fashion. Imagine a bin 

* Data lutnished couitesy of Dt T A Bundle\ 
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TABLE I 4 1 

95% Confidence Interval (Per Cent) for Binomial Distribution (])* 
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* Reference ( 1) at end of chapter 

t If / exceeds 50 read 100 - / = number observed and subtract each confidence limit 
from 100 


ft UJ n exceeds 0 DO read ! 00 — f n = fraction obsened and subtract each confidence 
limit from 100 
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TABLE 14 1 --(Continued) 
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* Reference (1) at end of chapter 

t If / exceeds 50 read 100 — number observed and subtract each confidence limit 
from 100 

ft If / n exceeds 0 50 read 1 00 - f n — fraction observed and subtract each confidence 
limit from 100 
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filled with beans, some white and some colored, thoroughly mixed. Dip 
out a scoopful of them at random, count the number of each color and 
calculate the percentage of white, say 40%. Now this is not only a count 
of the percentage of white beans in the sample but it is an estimate of the 
fraction of white beans in the bin. How close an estimate is it? That is 
where the second inference comes in. If there were 250 beans in the scoop, 
we look at the table for size of sample 250, fraction observed = 0.40. From 
the table we say with confidence that the percentage of white beans in the 
bin is between 34% and 46%. 

So far we have given no measure of the amount of confidence which 
can be placed in the second inference. The table heading is “95% Con- 
fidence Interval,” indicating a degree of confidence that can be described 
as follows : If the sampling is repeated indefinitely, each sample leading to 
a new confidence interval (that is, to a new interval estimate), then in 95% 
of the samples the interval will cover the true population percentage. If 
one makes a practice of sampling and if for each sample he states that the 
population percentage lies within the corresponding confidence interval, 
about 95% of his statements will be correct. Other and briefer descriptions 
will be proposed later. 

If you feel unsafe in making inferences with the chance of being 
wrong in 5% of your statements, you may use the second part of the table, 
“99% Confidence Interval.” For the Boone County sampling the interval 
widens to 1 3%~35%. If one says that the population percentage lies with- 
in these limits, he will be right unless a one-in-a-hundred chance has oc- 
curred in the sampling. 

If the size of the population is known, as it is in the case of Boone 
County farmers, the point and interval estimates can be expanded from 
percentages to numbers of individuals. There were 2,300 farmers in the 
county. Thus we estimate the number of sprayers in Boone County in 
1950 as 


(0.23)(2,300) - 529 farmers 

In the same way, since the 95% confidence interval extends from 15% 
to 32% of the farmers, the 95% limits for the number of farmers who 
sprayed are 

(0.15X2.300) - 345 farmers: and (0.32X2,300) = 736 farmers 

Two points about interval estimates need emphasis. First, the con- 
fidence statement is a statement about the population ratio, not about 
the ratio in other samples that might be drawn. Second, the uncertainty 
involved comes from the sampling process. Each sample specifies an 
interval estimate. Whether or not the interval happens to include the 
fixed population ratio is a hazard of the process. Theoretically, the 95% 
confidence intervals are determined so that 95% of them will cover the 
true value. 

Before a sample is drawn, one can specify the probability of the truth 
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of his prospective confidence statement. He can say, “I expect to take a 
random sample and to make an interval estimate from it. The probability 
is 0.95 that the interval will cover the population fraction.” After the 
sample is drawn, however, the confidence statement is either true or it is 
false. Consequently, in reporting the results of the Boone County sam- 
pling, it would be incorrect to say, “The probability is 0.95 that the number 
of sprayers in Boone County in 1950 lies between 345 and 736.” This 
logical point is a subtle one, and does not weaken the effectiveness of 
confidence interval statements. In a specific application, we do not know 
whether our confidence statement is one of the 95% that are correct or one 
•of the 5% that are wrong. There are methods, in particular the method 
known as the Bayesian approach, that provide more definite probability 
statements about a single specific application, but they require more 
assumptions about the nature of the population that is being sampled. 

The heading of this chapter is “Sampling of Attributes.” In the 
numerical example the attribute in question was whether the farm had 
been sprayed or not. The possession or lack of an attribute distinguishes 
the two classes of individuals making up the population. The data from 
the sample consist of the numbers of members of the sample found to have 
or to lack the attribute under investigation. The sampling of populations 
with two attributes is very common. Examples are Yes or No answeis to 
a question, Success or Failure in some task, patients Improved or \ot 
Improved under a medical treatment, and persons who Like or Dislike 
some proposal. Later (chapter 9) we shall study the sampling of popula- 
tions that have more than two kinds of attributes, such as persons who are 
Strongly Favorable , Mildly Favorable , Neutral , Mildly Unfavorable , or 
Strongly Unfavorable to some proposal. The theory and methods for 
measurement data, such as heights, weights, or ages, will be considered 
in chapter 2. 

This brief preview displays a goodly portion of the wares that the 
statistician has to offer: the sampling of populations, examination of the 
facts turned up by the sample, and, based on these facts, inferences about 
the sampled population. Before going further, you may clarify you* 
thinking b) working a few examples. 

Examples form an essential part of our presentation of statu: ics. 
In eacii list they are graded so that you nia> start with the easier. St is 
suggested that a few in each group be worked after the first reading of the 
text, reserving the more difficult until experience is enlarged. Statistics 
cannot be mastered without this or similar practice. 

EXAMPLE 1 A1 — In controlling the quality of a mass-produced article in industry, a 
random sample ot 100 articles from a large lot were each tested for effectiveness. Ninety- 
two were 1 ound effective. What are the 99° 0 confidence limits for the percentage of effecth e 
articles in the whole lot? Ans. 83 u 0 and 97%. Hint: look m the table for 100 - 92 = 8 

o EXAMPLE 1.4.2 — If 1,000 articles in the preceding example had been tested and only 
8% found ineffective, what would be the 99% limits 9 Ans. Between 90% and 94% are effec- 
tive. Note how the limits have narrowed as a result of the increased sample size. 
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EXAMPLE 1.4.3 — A sampler of public opinion asked 50 men to express their prefer- 
ences between candidates A and B. Twenty preferred A. Assuming random sampling from 
a population of 5,000, the sampler stated that between 1,350 and 2,750 in the population 
preferred A. What confidence interval was he using? Ans. 95° 0 . 

EXAMPLE 1.4.4 — !n a health survey of adults, 86% stated that they had had measles 
at some time in the past. On the basis of this sample the statistician asserted that unless a 
1 -in-20 chance had occurred, the percentage of adults in the population who had had measles 
was between 81% and 90%. Assuming random sampling, what was the size of the sample? 
Ans. 250. Note: the statistician’s inference may have been incorrect for other reasons. 
Some people have a mild attack of measles without realizing it. Others may have forgotten 
that they had it. Consequently, the confidence limits may be underestimates for the per- 
centage in the population who actually had measles, as distinct from the percentage who 
would state that they had it. 

EXAMPLE 1.4.5 — If in the sample of 100 Boone County farmeis none had sprayed, 
what 95% confidence statement would you make about the fanners in the county? Ans. 
Between none and 4% sprayed. But suppose that all farmers in the sample were sprayers, 
what is the 99% confidence interval? Ans. 95%-10O%. 

EXAMPLE 1.4.6 — If you guess that in a certain population between 25% and 75% of 
the housewives own a specified appliance, and if you wish to draw a sample that will, at the 
95% confidence level, yield an estimate differing by not more than 6 from the correct percent- 
age, about how large a sample must you take? Ans. 250. 

EXAMPLE 1.4.7— An investigator interviewed 115 women over 40 years of age from 
the lower middle economic level in rural areas of middlewestern states. Forty -six of them had 
listened to a certain radio program three or more times during the preceding month. As- 
suming random sampling, what statement can be made about the percentage of women 
listening in the population, using the 99% interval? Ans. Approximately, between 28.4%, 
and 52.5% listen. You will need to interpolate between the results for n - 100 and n = 250. 
Appendix A 1 (p. 541) gi\es hints on interpolation. 

EXAMPLE 1.4.8 — For samples that show 50% m a certain class, write down the width 
of the 95% confidence interval for n = 10, 20, 30, 50, 100, 250, and 1,000. For each sample 
size rc, multiply the width of the interval by v n. Show that the product is always near 200. 
This means that the width of the interval is approximately related to the sample size by the 
formula W — 200%'w. We say that the width goes down as 1 \ n. 

1.5 — Random sampling. The confidence intervals in table 1 4.1 were 
computed mathematically on the assumption that the data are a random 
sample from the population. In its simplest form, random sampling 
means that every member of the population has an equal chance of ap- 
pearing in the sample, independently of the othei members that happen 
to fall in the sample. Suppose that the population has four members, 
numbered I, 2, 3, 4, and that we are drawing samples of size two. There 
are ten possible samples that contain two members: namely, (1, 2), (1, 3), 
(1, 4), (2, 3), (2, 4), (3, 4), (1, 1), (2, 2), (3, 3), and (4, 4), With simple 
random sampling, each of these ten samples has an equal chance of being 
the sample that is drawn. Notice two things. Every member appears 
once in three samples and twice in one sample, so that the sampling shows 
no favoritism as between one member and another. Secondly, look at 
the four samples in which a 1 appears, (1, 2), (1, 3), (1,4), and (1, 1). The 
second member is equally likely to be a 1,2, 3, or 4. Thus, if we are told 
that 1 has been drawn as the first member of the sample, we know that 
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each member of the population stiM&^^^al chance of being the sec- 
ond member of the sample* This is what is meant by the phrase “inde- 
pendently of the other members that happen to fall in the sample."" 

A common variant of this method of sampling is to allow any mem- 
ber of the population to appear only once in the sample* There are then 
six possible samples of size two: (1,2), (1,3), (1,4), (2, 3), (2, 4), and (3,4). 
This is the kind of sampling that occurs when two numbers are drawn out 
of a hat, no number being replaced in the hat. This type of sampling is 
called random sampling without replacement , whereas the sampling de- 
scribed in the preceding paragraph is random sampling n ith replacement . 
If the sample is a small fraction of the population, the two methods are 
practically identical, since the possibility that the same item appears 
more than once in a sample is negligible. Throughout most of the book 
we shall not distinguish between the two methods. In chapter 17, for- 
mulas applicable to sampling without replacement are presented. 

There are more complex types of random sampling. In all of them, 
every member of the population has a known probability of coming into 
the sample, but these probabilities may not be equal or they may depend, 
in a known way, on the other members that are m the sample. In the 
Boone County sampling a book was available showing the location of 


every farm in the county. Each farm was numbered so that a random 
sample could have been drawn by mixing the numbers thoroughly in a 
box, then having a hundred of them drawn by a blindfolded person. 
Actually, the samplers used a scheme known as stratified random sampling . 
From the farms in each township (a subdivision of the county) they drew 
a random sample with a size proportional to the number of farms in that 
township. In this example, each farm still has an equal chance of appear- 
ing in the sample, but the sample is constructed to contain a specified 
number from every township. The chief advantage is to spread the sam- 
ple more uniformly o\er the county, retaining the principle of random- 
ness within each township. Statistical methods for stratified samples 
are presented in chapter 17. The conclusions arc onh slightly altered by 
considering the sample completely random. Unless otherwise mentioned, 
we will use the phrases “random sample"’ and “random sampling” to 
denote the simplest type of random sampling with replacement as de- 


scribed in the first paragraph of this section. 

An important feature of all random sampling schemes is that the 
sampler has no control over the specific choice of the units that appear 
in the sample. If he exercises judgment in this selection, by choosing 
“typical” members or excluding members that appear “atypical,” his 
results are not amenable to probability theory, and confidence intervals, 
which give valuable information about the accuracy of estimates made 
from the sample, cannot be constructed. 

In some cases the population is thoroughly mixed before the sample 
is taken, as illustrated by the mascerating and blending of food or other 
chemical products, by a naturally mixed aggregate such as the blood 
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stream, or by the sampling of a liquid from a vat that has been repeatedly 
stirred. Given an assurance of thorough mixing, the sample can be drawn 
from the most accessible part of the population, because any sample 
should give closely similar results. But complete mixing in this sense is 
often harder to achieve than is realized. With populations that are vari- 
able but show no clear pattern of variation, there is a temptation to con- 
clude that the population is naturally mixed in a random fashion, so that 
any convenient sample will behave like one randomly drawn. This 
assumption is hazardous, and is difficult to verify without a special in- 
vestigation. 

One way of drawing a random sample is to list the members of the 
population in some order and write these numbers on slips of paper, 
marbles, beans, or small pieces of cardboard. These are placed in a box or 
bag, mixed carefully, and drawn out, with eyes shut, one by one until 
the desired size of sample is reached. With small populations this method 
is convenient, and was much used in the past for classroom exercises. 
It has two disadvantages. With large populations it is slow and unwieldy. 
Further, tests sometimes show that if a large number of samples are drawn, 
the samples differ from random samples in a noticeable way, for instance 
by having certain members of the population present more frequently 
than they should be. In other words, the mixing was imperfect. 

1.6 — Tables of random digits. Nowadays, samples are mostly drawn 
by the use of tables of random digits. These tables are produced by a 
process — usually mechanical or electrical — that gives each of the digits 
from 0 to 9 an equal chance of appearing at every draw. Before publica- 
tion of the tables, the results of the drawings are checked in numerous 
ways to ensure that the tables do not depart materially from randomness 
in a manner that would vitiate the commonest usages of the tables. Table 
A I (p. 543) contains 10,000 such digits, arranged in 5 x 5 blocks to facili- 
tate reading. There are 100 rows and 100 columns, each numbered from 
00 to 99. Table 1.6.1 shows the first 100 numbers from this table. 

The chaotic appearance of the set of numbers is evident. To illus- 
trate how the table is used with attribute data, suppose that 50°/ o of the 
members of a population answer “Yes” to some question. We wish to 
study how well the proportion answering “Yes” is estimated from a ;am- 


TABIE , 6 I 

One Hi mmld Rasdom Dighs I-rom Table A 1 



00-04 

05-09 

10-14 

15-19 

00 

54463 

22662 

65905 

70639 

01 ' 

1 5389 

85205 

18850 

39226 

02 

85941 

4(T56 

82434 

02015 

03 

61149 

69440 

11286 

88218 

04 

05214 

SI 63 9 

10651 

67079 
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pie of size 20. A “Yes” answer can be represented by the appearance 
of one of the digits 0, 1,2, 3, 4, or alternatively by the appearance of an 
odd digit. With either choice, the probability of a “Yes” at any draw 
m the table is one-half. We shall choose the digits 0, 1 , 2, 3, 4 to represent 
“Yes,” and let each row represent a different sample of size 20. A 
count, much quicker than drawing slips of paper from a box, shows 
that the successive rows in table 1.6.1 contain 9, 9, 12, 11, and 9 “Yes” 
answers. Thus, the proportions of “Yes” answers in these five samples 
of size 20 are, respectively, 0.45, 0.45, 0.60, 0.55, and 0.45. Continuing 
in this way we can produce estimates of the proportion of “Yes” an- 
swers given by a large number of separate samples of size 20, and then 
examine how close the estimates are to the population value. In count- 
ing the row numbered 02 , you may notice a run of results that is typical 
of random sampling. The row ends with a succession of eight consecu- 
tive “Yes” answers, followed by a single “No.” Observing this phe- 
nomenon by itself, one might be inclined to conclude that the proportion 
in the population must be larger than one-half, or that something is 
wrong with the sampling process. 

Table A 1 can also be used to investigate sampling in which the pro- 
portion in the population is any of the numbers 0.1, 0.2, 0.3, . . . 0.9. 
With 0.3, for example, we let the digits 0, 1, or 2 represent the presence of 
the attribute and the remaining seven digits its absence. If you are inter- 
ested in a population in which the proportion is 0.37, the method is to select 
pairs of digits, letting any pair between 00 and 36 denote the presence of 
the attribute. Tables of random digits are employed in studying a wide 
range of sampling problems. You can probably see how 7 to use them to 
answer such questions as: On the average, how many digits must be taken 
until a 1 appears? — or, How frequently does a 3 appear before either a 
1 or a 9 has appeared? In fact, sampling from tables of random digits 
has become an important technique for solving difficult problems in 
probability for which no mathematical solution is known at present 
This technique goes by the not inappropriate name of the Monte Carlo 
method. For this reason, modern electronic computing machines have 
programs available for creating their own tables of random digits as the> 
proceed with their calculations. 

To the reader who is using random numbers for his own purposes, 
we suggest that he start on the first page and proceed systematically 
through the table. At the end of any problem, note the rows and columns 
used and the direction taken in counting. This is sometimes needed for 
later reference or m communicating the results to others. Since no digit 
is used more than once, the table may become exhausted, but numerous 
tables are available. Reference (2) contains 1 million digits In classroom 
use, when a number of students are working from the same table, obtain- 
ing samples w 7 hose results will be put together, different students can start 
at different parts of the table and also \ar> the direction m which they 
proceed, m order to avoid duplicating the results of others. 
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1.7 — Confidence interval: verification of theory. One who draws 
samples from a known population is likely to be surprised at the capricious 
way in which the items turn up. It is a salutary discipline for a student 
or investigator to observe the laws of chance in action lest he become too 
confident of his professional samplings. At this point we recommend that 
a number of samples be selected from a population in which the propor- 
tion of “Yes"’ answers is one-half. Vary the sample sizes, choosing some 
of each of the sizes 10, 15, 20, 30, 50, 100, and 250 for which confidence 
intervals are given in table 1.4.1 (1,000 is too large). For each sample, 
record the sample sizes and the set of rows and columns used in the table 
of random digits. From the number of “Yes” answers and the sample 
size, read table 1.4.1 to find the 95% and 99% confidence intervals for the 
percentage of “Yes’" answers in the population. For each sample, you 
can then verify whether the confidence interval actually covers 50%. If 
possible, draw 100 or more samples, since a large number of samples is 
necessary for any close verification of the theory, particularly with 99% 
intervals. In a classroom exercise it is wise to arrange for combined 
presentation and d^eussion of the results from the whole class. Preserve 
the results (sample sizes and numbers of “Yes” answers) since they will 
be used again later. 

You have now done experimentally what the mathematical statis- 
tician does theoretically when he studies the distribution of samples 
drawn at random from a specified population. 

For illustration, suppose that an odd digit represents a “Yes” 
answer, and that the first sample, of size 50, is the first column of table A 1 . 
Counting down the column, you will find 24 odd digits. From table 1.4.1, 
the 95° confidence interval extends from 36% to 64%, a correct verdict 
because it includes the population value of 50%. But suppose one of your 
samples of 250 had started at row 85, column 23. Moving down the suc- 
cessive columns you would count only 101 or 40.4% odd and would 
assert that the true \ alue is between 3^% end 46 0 . You would be wrong 
despite the f a 't that the sample u randrrr y drawn from the same popu- 
lation as the withers. This sample m rer .opens to be unusually diver- 
gent. Yea should find about five -.amp’ m a hundred leading to in- 
correct statements, bin there &i\' v no ccc^s on for surprise if only three, 
or as many as seven, turn up. With confidence probability 99% you ex- 
pect, of course, only about one statement in a hundred to be wrong. We 
hope that your results are sufficiently concordant with theory to give 
you confidence m it. You will certainly be more aware of the vagaries 
of sampling, and this is one of the objectives of the experiment. Another 
lesson to be learned is that only broad confidence intervals can be based 
on small samples, and that even so the inference can be wrong. 

Finally, as is evident in table 1.4.1, you may have observed that the 
interval narrows rather slowly with increasing sample size. For samples 
of size 100 that show a percentage of “Yes” answers anywhere between 
40° o and 60° 0 , the 95° 0 confidence interval is consistently of width 20%. 



With a sample ten times as large (n = 1,000) the width of the interval de- 
creases to 6%. The width goes down roughly as the square root of the 
sample size, since 20/6 is 3.3 and ^/lO is 3.2 (this result was verified in 
example 1.4.8). 

Failure to make correct inferences in a small portion of the samples 
is not a fault that can be remedied, but a fault inevitably bound up in the 
sampling procedure. Fallibility is in the very nature of such evidence. 
The sampler can only take available precautions, then prepare himself for 
his share of mistakes. In this he is not alone. The journalist, the judge, 
the banker, the weather forecaster— these along with the rest of us are 
subject to the laws of chance, and each makes his own quota of wrong 
guesses. The statistician has this advantage : he can, in favorable circum- 
stances, know his likelihood of error. 

1.8 — The sampled population. Thus far we have learned that if we 
want to obtain some information about a population that is too large to 
be completely studied, one way to do this is to draw a random sample 
and construct point and interval estimates, as in the Boone County exam- 
ple. This technique of making inferences from sample to population is 
one of the principal tools in the analysis of data. The data, of course, 
represent the sample, but the concept of the population requires further 
discussion. In many investigations in which data are collected, the popu- 
lation is quite specific, apart possibly from some problems of definition; 
the patients in a hospital on a particular day, the payments received by a 
firm during the preceding year, and so on. In such cases the investigator 
often proceeds to select a simple random sample, or one of the more 
elaborate methods of sampling to be presented in chapter 17, and makes 
inferences directly from his sample to his population. 

With a human population, however, the population actually sampled 
may be narrower than the original population because some persons 
drawn into the sample cannot be located, are ill, or refuse to answer the 
questions asked. Non-responses of this kind in 5% to 15% of the sample 
are not uncommon. The population to which statistical inferences apply 
must be regarded as the aggregate of persons who would supply answers 
if drawn into the sample. 

Further, for reasons of feasibility or expense, much research is carried 
out on populations that are greatly restricted as compared to the popula- 
tion about which, ideally, the investigator would like to gain information. 
In psychology and education the investigator may concentrate on the 
students at a particular university, although he hopes to find results that 
apply to all young men in the country of college age. If the measuring 
process is troublesome to the person being measured, the research worker 
may have to depend on paid volunteers. In laboratory research on ani- 
mals the sample may be drawn from the latest group of animals sent from 
the supply house. In many of these cases the sampled population, from 
the viewpoint of statistical inference, is hard to define concretely. It is the 
kind of population of which the data can be regarded as a random sample. 
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Confidence interval statements apply to the population that was 
actually sampled. Claims that such inferences apply to some more exten- 
sive population must rest on the judgment of the investigator or on addi- 
tional extraneous information that he possesses. Careful investigators 
take pains to describe any relevant characteristics of their data in order 
that the reader can envisage the nature of the sampled population. The 
investigator may also comment on ways in which his sampled population 
appears to differ from some broader population that is of particular 
interest. As is not surprising, results soundly established in narrow popu- 
lations are sometimes shown to be erroneous in much broader popula- 
tions. Fortunately, local studies that claim important results are usually 
repeated by investigators in other parts of the country or the world, so 
that a more extensive target population is at least partially sampled in 
this way. 

1.9 — The frequency distribution and its graphical representation. 
One group of students drew 200 samples, each of size 10. The combined 
results are compactly summarized in a frequency distribution , shown in 
table 1.9.1. There are only eleven possible results for the number of odd 
digits in a sample, namely the integers 0, 1, 2, ... 10. Consequently, the 
frequency distribution has eleven classes . The number of samples out of 
the 200 that fail into a class is the class frequency. The sum of the class 
frequencies is, of course, the total number of samples drawn, 200. The 
classes and their frequencies give a complete summary of the drawings. 

This type of frequency distribution is called discrete , because the 
variable, number of odd digits, can take only a limited number of distinct 
values. Later we shall meet continuous frequency distributions, which are 
extensively used with measurement data. 

One striking feature of the sampling distribution is the concentra- 


TABLE 1.9.1 

Frequency Distribution of Numbers of Odd Digits in 200 Samples of n - 10 


Class 

(Number of Odd Digits) 

Class 

| Frequency 

Theoretical 

Class Frequency 

0 

1 

0.2 

1 

1 

2.0 

2 

1 8 

8.8 

3 

25 

23.4 

4 

! 39 

41 0 

5 

45 

49.2 

6 

1 36 

41 0 

7 

25 

23 4 

8 

1 16 

88 

9 

4 

2.0 

10 

0 

0.2 

Total F requeue) 

200 

200.0 
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tion of frequencies near the middle of the table. The greatest frequency 
is in the class of five odd digits ; that is, half odd and half even. The three 
middle classes, 4, 5, 6, contain 39 + 45 + 36 = 120 samples, more than 
half of the total frequency. This central tendency is the characteristic 
that gives us confidence in sampling — most samples furnish close esti- 
mates of the population fraction of odds. This should counterbalance the 
perhaps discouraging fact that some of the samples are notably divergent. 

Another interesting feature is the symmetry of the distribution, the 
greatest frequency at the center with a trailing away at each end. This is 
because the population fraction is 50%; if the percentage were nearer zero 
or 100, the frequencies would pile up at or near one end. 

The regularity that has appeared in the distribution shows that chance 
events follow a definite law. The turning up of odd digits as you counted 
them may have seemed wholly erratic: whether an odd or an even would 
come next was a purely chance event. But the summary of many such 
events reveals a pattern which may be predicted (aside from sampling 
variation). 

Instead of showing the class frequencies in table 1.9.1, we might have 
divided each class frequency by 200, the number of samples, obtaining a 
set of relative class frequencies that add to 1 . As the number of samples is 
increased indefinitely, these relative frequencies tend to certain fixed 
values that can be calculated from the theory of probability. The theoreti- 
cal distribution computed in this wa> is known as the binomial distribu- 
tion. It is one of the commonest distributions in statistical work. In 
general terms, the formula for the binomial distribution is as follows. 
Suppose that we are drawing samples of size n and that the attribute in 
question is held by a proportion p of the members of the population. The 
relative frequency of samples containing r members having the attribute, 
or in other words the probability that a sample will contain r members 
having the attribute, is 

n{n - l)(n — 2 )■■■(« — r+ 1) _ 

r(r — l)(r — 2) . . . (2)(1) l[ " 

In the numerator the expression n(n — 1 )(n - 2 ) . . . in - r -f 1 ) means 
“multiply together all the integers from n down to (n — r + 1 ), inclusive." 
Similarly, the first term in the denominator is a shorthand way of writing 
the instruction “multiply together all integers from r down to 12' We 
shall study the binomial distribution and its mathematical derivation in 
chapter 8. 

What does this distribution look like for our sampling in table » 9 1 ? 
We have n = 10 and p — 1/2. The relative frequency or probabihtv of a 
sample having four odd digits is, putting r = 4 so that (/?-/•+ 1 ) -7, 

(10)(9)(8)(7) /1\ 4 /1 \ 6 ^ plo m i ° = 210 
(4)(3)(2)(l)\2y \2J \2j 1024 



18 Chapter h Sampling of Atfi ihutes 

As already mentioned, these relative frequencies add to 1. (This is 
not obvious by looking at the formula, but comes from a well-known 
result in algebra.) Hence, in our 200 samples of size 10, the number that 
should theoretically have four odd digits is 

(200K210) _ 

1024 

These theoretical class frequencies are given in the last column of table 
1.9.1. The agreement between the actual and theoretical frequencies is 
pleasing. 

The graph in figure 1.9.1 brings out the features of the binomial 
distribution. On the horizontal axis are marked off the different classes — 
the numbers of odd digits. The solid ordinate beside each class, number 
is the observed class frequency while the dotted ordinate represents the 
theoretical frequency. This is the type of graph appropriate for discrete 
distributions. 



Fig. 1.9.1— Frequency attribution of number of odd digits in each of 200 samples of size 
10 The dotted lines represent the theoretical binomial distribution from which the samples 

^ere drawn. 
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EXAMPLE 1.9.1 — For the 200 samples of size 10 in table 1.9.1, in how many cases is 
(i) the 95% confidence interval statement wrong? (ii) the 99% confidence interval statement 
wrong? Ans. (i) 6 times, or 3.0%; (ii) 1 time, or 0.5%. 

EXAMPLE 1.9.2 — Use the table of random digits to select a random sample of 20 
pages of this book, regarding the population as consisting of pages 3-539. Note the number 
of pages in your sample that do not contain the beginning of a new section, and calculate 
the 95% interval for the proportion of pages- in'the book on which no new section begins. 
Don’t count “References” as a section. The population proportion is 317/537 = 0.59. 

EXAMPLE 1.9.3 — When the doors of a clinic are opened, twelve patients enter simul- 
taneously. Each patient wishes to be handled first. Can you use the random digit table to 
arrange the patients in a random order? 

EXAMPLE 1 .9 A — A sampler of public opinion estimates from a sample the number of 
eligible voters in a state favoring a certain candidate for governor. Assuming that his esti- 
mate was close to the population value at the time the survey was made, suggest two reasons 
why the ballot on election day might be quite different. 

EXAMPLE 1.9.5 — A random sample of families from a population has been selected. 
An interviewer calls on each family at its home between the hours of 9 a.m, and 5 p.m. If 
no one is at home, the interviewer makes no attempt to contact the family at a later time. For 
each of the following attnbutes, give your opinion wnether the sample results are likely to 
overestimate, underestimate, or be at about the correct level: (i) proportion of families in 
which the husband is retired, (ii) proportion of families with at least one child under 4 years, 
(lii ) proportion of families in which husband and wife both work. Give your reasons. 

EXAMPLE 1.9.6— From the formula for the binomial distribution, calculate the prob- 
ability of 0, 1,2 “Yes” answers in a sample of size 2, where p is the proportion of “Yes” 
answers in the population. Show that the three probability values add to 1 for any value of p. 

EXAMPLE 1 .9.7— At birth the probability that a child is a boy is very close to one- 
half. Show that according to the binomial distribution, half the families of size 2 should 
consist of one boy and one girl. Why is the proportion of bov-girl families likely to be slightly 
less than one-half in practice 0 

EXAMPLE 1 .9.8- Five dice were tossed 100 times. At each toss the number of two’s 
(deuces) out of five were noted, with these results: 


Number Deuces 

Frequency of 

Theoretical 

Per Toss 

Occurrence 

Frequency 

5 

n 

0.013 

4 

3 

0.322 

3 

3 

3.214 

2 

18 

16.075 

1 

42 

40.188 

0 

32 

40.188 

Total 

100 

100.000 


(i) From the binomial distribution, verify the result 16.075 for the theoretical frequency 
of 2 deuces, (n) Draw a graph showing the observed and theoretical distributions, (iii) Do 
you think the dice were balanced and fairly tossed? Ans. -The binomial probability of 2 
deuces is 1250/7776 = 0.16075. This is multiplied, by 100 to give the theoretical frequency. 
A later test (example 9.5.1) casts doubt on the throws. 
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1.10 — Hypotheses about populations. The investigator often has in 
mind a definite hypothesis about the population ratio, the purpose of 
the sampling being to get evidence concerning his hypothesis. Thus a 
geneticist studying heredity in the tomato had reason to believe that in 
the plants produced from a certain cross, fruits with red flesh and yellow 
flesh would be in the ratio 3:1. In a sample of 400 he found 3 10 red toma- 
toes instead of the hypothetical 300. With your experience of sampling 
variation, would you accept this as verification or refutation of the hy- 
pothesis? Again, a physician has the hypothesis that a certain disease 
requiring hospitalization is equally common among men and women. 
In a sample of 900 hospital cases he finds 480 men and 420 women. Do 
these results-support or contradict his hypothesis? (Incidentally, this is 
an example in which the sampled population may differ from the target 
population. Although good medical practice may prescribe hospitaliza- 
tion, there are often cases that for one reason or another do not come to 
a hospital and therefore could not be included in his sample.) 

To answer such questions two results are needed, a measure of the 
deviation of the sample from the hypothetical population ratio, and a 
means of judging whether this measure is an amount that would commonly 
occur in sampling, or, on the contrary, is so great as to throw doubt upon 
the hypothesis. Both results were furnished by Karl Pearson in 1899 (3). 
He devised an index of dispersion or test criterion denoted by y 2 (chi- 
square) and obtained the formula for its theoretical frequency distribution 
when the hypothesis in question is true. Like the binomial distribution, 
the chi-square distribution is another of the basic theoretical distributions 
much used in statistical work. Let us first examine the index of dispersion. 

1.11 — Chi-square, an index of dispersion. Naturally, the deviations 
of the observed numbers from those specified by the hypothesis form the 
basis of the index. In the medical example, with 900 cases, the numbers 
of male and female cases expected on the hypothesis are each 450. The 
deviations, then, are 


and 


480 - 450 = +30, 
420 - 450 = -30, 


the sum of the two being zero. The value of chi-square is given by 




( + 30) 2 (-3Q) 2 . ^ . 

+ - - 7T J - = 2 + 2 = 4 


450 


450 


Each deviation is squared , each square is divided by the hypothetical or 
expected number , and the results are added. The'expected numbers appear 
in the denominators in order to introduce sample size into the quantity — 
it is the relative size that is important. 

The squaring of the deviations in the numerator may puzzle you. 
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It is a common practice in statistics. We shall simply say at present that 
indexes constructed in this way have been found to have great flexibility, 
being applicable to many different types of statistical data. Note that the 
squaring makes the sign of the deviation unimportant, since the square of a 
negative number is the same as that of the corresponding positive number. 
It is clear that chi-square would be zero if the sample frequencies were the 
same as the hypothetical, and that it will increase with increasing deviation 
from the hypothetical. But it is not at all clear whether a chi-square value 
of 4 is to be considered large, medium, or small. 

To furnish a basis for judgment on this point is our next aim. Pearson 
founded his judgment from a study of the theoretical distribution of chi- 
square, but we shall investigate the same problem by setting up a sampling 
experiment. Before doing this, a useful formula will be given, together 
with a few examples to help fix it in mind. 

1.12 — The formula for chi-square. It is convenient to represent by 
f x and f 2 the sample counts of individuals who do and do not possess the 
attribute being investigated, the corresponding hypothetical or expected 
frequencies being F l and F 2 , The two deviations, then, are f\ - F t and 
f 2 - F 2 <, so that chi-square is given by the formula, 

x 1 - (ft - + (fl ~ F 2 ) 2 /f 2 

The formula may be condensed to the more easily remembered as well as 
more general one, 

X 2 = !(/- F) z /F, 

where X denotes summation. In words, “Chi-square is the sum of such 
ratios as 


(deviation squared)/(expected number)” 


Let us apply the formula to the counts of red and yellow tomatoes 
in section 1.10. There, /i = 310, f 2 = 400 - 310 = 90, F, = 3/4 of 
400 = 300, and F 2 = 1/4 of 400 = 100. Whence, 

, (310 - 300) 2 , (90 - 100) 2 , „ 

x 306 -- - + ioo ~ U3 

Note. When computing chi-square it is essential to use the actual size 
of sample and the actual numbers in the two attribute classes. If we know 
onl> the percentages or proportions in the two classes, chi-square cannot 
hi l a! calated. Suppose we are told that 80% of the tomato plants in a 
sample are red, and asked to compute chi-square. If we guess that the 
sample contained 100 plants then 


2 (80 - 75) 2 (20 

X 2 = — ^ — + — 


25) 2 


25 25 

75 + 25 


1.33 


75 


25 
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But if the sample actually contained only 10 plants, then 




7.5) 2 (2 - 2.5) 2 


7.5 


' 4- 


2.5 


0.25 0.25 

7.5 + 2.5 ~ °' 133 


If the sample had 1,000 plants, a similar calculation finds x 2 = 13.33. 
For a given percentage red, the value of chi-square can be anything from 
almost zero to a very large number. 

EXAMPLE 1.12.1 — A student tossed a coin 800 times, getting 440 heads. What is the 
value of chi-square in relation to the hypothesis that heads and tails are equally likely? 
Ans. 8. 


EXAMPLE 1.12.2 — If the count in the preceding example had been 220 heads out of 
400 tosses, would chi-square also be half its original value? 

EXAMPLE 1.12.3 — A manufacturer of a small mass-produced article claims that 96% 
of the articles function properly. In an independent test of 1 ,000 articles, 950 were found to 
function properly. Compute chi-square. Ans. 2.60. 

EXAMPLE 1 . 12.4 — In the text example about tomatoes the deviation from expectation 
was 10. If the same deviation had occurred in a sample of twice the size (that is, of 800), 
what would have been the value of chi-square? Ans. 0.67, half the original value. 


1.13 — Ait experiment in sampling chi-square ; the sampling distribution. 
You have now had some practice in the calculation of chi-square. Its 
main function is to enable us to judge whether the sample ratio itself de- 
parts much or little from the hypothetical population value. For that 
purpose we must answer the question already proposed: What values of 
chi-square are to be considered as indicating unusual deviation, and what 
as ordinary sampling variation? Our experimental method of answering 
the question will be to calculate chi-square for each of many samples 
drawn from the table of random numbers, then to observe what values of 
chi-square spring from the more unusual samples. If a large number of 
samples of various sizes have been drawn and if the value of chi-square is 
computed from each, the distribution of chi-square may be mapped. 

The results to be presented here come from 230 samples of sizes vary- 
ing from 10 to 250, drawn from the random digits table A 1 . We suggest 
that the reader use the samples that he drew in section 1.7 when verifying 
the confidence interval statements. There is a quick method of calculat- 
ing chi-square for all samples of a given size n. Since odd and even digits 
are equally likely in the population, the expected numbers of odd and even 
digits are F x = F 2 = n/2. The reciprocals of these numbers are therefore 
both equal to 2/n . Remembering that the two deviations are the same in 
absolute value and differ only in sign, we may write 

x 2 = U\ - ^i) 2 (i/^i + l >F 2 ) 

~ d 2 (2/n + 2 n) = 4 d 2 n 

where d is the absolute value of the deviation. For all samples of a fixed 
size the multiplier 4/n is constant. Once it has been calculated it can be 
used again and again. 
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To illustrate, suppose that n = 100. The multiplier 4/n is 0.04. If 56 
odd digits are found in a sample, d = 6 and 

X 2 = (0.04)(6 2 ) - 1.44 

Proceed to calculate chi-square for each of your samples. To 
summarize the results, a frequency distribution is again convenient. There 
is one difference, however, from the discrete frequency distribution used 
in section 1 .9 when studying the binomial distribution. With the binomial 
for n = 10, there were only eleven possible values for the numbers of odd 
digits, so that the eleven classes in the frequency distribution selected 
themselves naturally. On the other hand, with chi-square values calcu- 
lated from samples of different sizes, there is a large number of possible 
values. Some grouping of the values into classes is necessary. A distribu- 
tion of this type is sometimes described as continuous , since conceptually 
any positive number is a possible value of chi-square. 

When forming frequency distributions from continuous data, decide 
first on the classes to be used. For most purposes, somewhere between 
8 and 20 classes is satisfactory. Obtain an idea of the range of the data 
by looking through them quickly to spot low and high values. Most of 
your chi-squares will be found to lie between 0 and 5. Equal-sized class 
intervals of 0.00-0.49, 0.50-0.99, . . . will therefore cover most of the 
range in 10 classes, although a few values of chi-square greater than 5 may 
occur. Our values of % 2 were recorded to 2 decimal places. 

Be sure to make the classes non-overlapping, and indicate clearly 
what the class intervals are. Class intervals described as "‘0.00-0.50,” 
“0.50-1.00,” “1.00-1.50” are not satisfactory, since the reader does not 
know in what classes the values 0.50 and 1.00 have been placed. If the 
chi-square values were originally computed to three decimal places, re- 
ported class intervals of “0.00-0.49,” "‘0.50-0.99.” and so on, would be 

TABLE 1.13.1 

Sampling Distribution of 230 Values of Chi-Sou are Calculated From Samples 
Drawn From Table A 1 
Sample sizes- 10, 15, 20, 30, 50, 100, and 250 


Class Interval 

Frequency 

Class Interval 

Frequency 

0.00-0.49 

116 

6.00- 649 

0 

0.50-0.99 

39 

6.50- 6.99 

1 

1.00-1.49 

18 

7.00- 749 

0 

1.50-1.99 

22 

7 50- 7.99 

0 

2.00-2.49 

12 

8.00- 849 

0 

2.50-2.99 

5 

8.50- 8.99 

I 

3.00-3.49 

5 

9.00- 949 

0 

3.50-3.99 

6 

9.50- 9.99 

0 

4.00-4.49 

1 

10.00-1049 

1 

4.50-4.99 

2 

10.50-10.99 

0 

5.00-549 

0 

11.00-1149 

1 

5.50-5.99 

0 

Total 

230 
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ambiguous, since it is not clear where a chi-square value of 0.493 is placed. 
Intervals of 0.000-0.494, 0.495-0.999, and so on, could be used. 

Having determined the class intervals, go through the data system- 
atically, assigning each value of chi-square to its proper class, then 
counting the number of values (frequency) in each class. Table 1.13.1 
shows the results for our 230 samples. 

In computing chi-square, we chose to regard the population as con- 
sisting of the 10,000 random digits in table A 1, rather than as an infinite 
population of random digits. Since 5,060 of the digits in table A 1 are 
odd, we took the probability of an odd digit as 0.506 instead of 0.50. The 
reader is recommended to use 0.50, as already indicated. The change 
makes only minor differences in the distribution of the sample values of 
chi-square. 

Observe the concentration of sample chi-squares in the smallest class, 
practically half of them being less than 0.5. Small deviations (with small 
chi-squares) are predominant, this being the foundation of our faith in 
sampling. But taking a less optimistic view, one must not overlook the 
samples with large deviations and chi-squares. The possibility of getting 
one of these makes for caution in drawing conclusions. In this sampling 
exercise we know the population ratio and are not led astray by discrepant 
samples. In actual investigations, where the hypothesis set up is not 
known to be the right one, a large value of chi-square constitutes a dilem- 
ma. Shall we say that it denotes only an unusual sample from the hy- 
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pothetical population, or shall we conclude that the hypothesis misrepre- 
sents the true population ratio? Statistical theory contains no certain 
answer. Instead, it furnishes an evaluation of the probability of possible 
sample deviations from the hypothetical population . If chi-square is large, 
the investigator is warned that the sample is an improbable one under his 
hypothesis. This is evidence to be added to that which he already pos- 
sesses, all of it being the basis for his decisions. A more exact determina- 
tion of probability will be explained in section 1.15. 

The graphical representation of the distribution of our chi-squares 
appears in figure 1.13.1. In this kind of graph, called a histogram , the 
frequencies are represented by the areas of the rectangular blocks in the 
figure. The graph brings out both the concentration of small chi-square 
at the left and the comparatively large sizes of a few at the right. It is now 
evident that for the medical example in section h 1 1 , x z = 4 is larger than 
a great majority of the chi-squares in this distribution. If this disease were 
in fact equally likely to result in male or female hospitalized cases, this 
would be an unusually large value of chi-square. 

L 14— Comparison with the theoretical distribution. Two features of 
our chi-square distribution have yet to be examined : (i) How does it com- 
pare with the theoretical distribution? and (ii) How can we evaluate more 
exactly the probabilities of various chi-square sizes? For these purposes 
a rearrangement of the class intervals is advisable. Since our primary 
interest is in the relative frequency of high values of chi-square, we used 
the set of class intervals defined by column 4 of table 1.14.1. The first three 
intervals each contain 25% of the theoretical distribution. As chi-square 
increases, the next four intervals contain respectively 15%, 5%, 4%, and 


TABLE 1.14.1 

Comparison of the Sample and Theoretical Distributions of Chi-Square 


| Sample Frequency 

Theoretical Frequency 

Distribution 

Distribution 

1 

Cumulative 


Class Interval j 



i 


Per Cent 

of Chi-square 1 

Actual 

Percentage , 

j 

Percentage ^ 

x 2 

Greater Than 

1 

2 

3 

4 1 

5 

6 

CM). 101 5 

57 

24.8 

25 | 

0 

100 

0.1015-0.455 

59 

25.6 

25 

0.1015 

75 

0.455-1.323 

62 

27.0 

25 

0.455 

50 

1.323-2.706 

32 

13.9 

I 15 

1.323 

25 

2.700-3.841 

14 

6.1 

i 5 

2.706 

10 

3.841-6.635 

3 

1.3 

4 

3.841 

5 

6.635- 

3 

1.3 

1 

j 6.635 

1 

Total 

230 

100.0 

100 

j 


I 
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1%. Since the theoretical distribution is known exactly and has been 
widely tabulated, the corresponding class intervals for chi-square, shown 
in column 1 , are easily obtained. Note that the intervals are quite unequal. 

Column 2 of table 1.14.1 shows the actual frequencies obtained from 
the 230 samples. In column 3, these have been converted to percentage 
frequencies, by multiplying by 100/230, for comparison with the theoreti- 
cal percentage frequencies in column 4. The agreement between columns 
3 and 4 is good. If your chi-square values have been computed mostly 
from small samples of sizes 10, 1 5, and 20, your agreement may be poorer. 
With small samples there is only a limited number of distinct values of chi- 
square, so that your sample distribution goes by discontinuous jumps. 

Columns 5 and 6 contain a cumulative frequency distribution of the 
percentages in column 4. Beginning at the foot of column 6, each entry 
is the sum of all the preceding ones in column 4, hence the name. The 
column is read in this way: the third to the last entry means that 10% 
of all samples in the theoretical distribution have chi-squares greater 
than the 2.706. Again, 50% of them exceed 0.455; this may be looked 
upnn as an average value, exceeded as often as not in the sampling. Final- 
ly, chi-squares greater than 6.635 are rare, occurring only once per 100 
samples. So in this sampling distribution of chi-square we find a measure 
in terms of probability, the measure we have been seeking to enable us 
to say exactly which chi-squares are to be considered small and which 
large. We are now to learn how this measure can be utilized. 

1.15 — The test of a null hypothesis or test of significance. As indicated 
in section 1.10, the investigator’s objective can often be translated into a 
hypothesis about his experimental material. The geneticist, you remem- 
ber, knowing that the Mendelian theory of inheritance produced a 3 : 1 
ratio, set up the hypothesis that the tomato population had this ratio of 
red to yellow fruits. This is called a null hypothesis , meaning that there 
is no difference between the hypothetical ratio and that in the population 
of tomato fruits. If this null hypothesis is true, then random samples of 
n will have ratios distributed binomially, and chi-squares calculated from 
the samples will be distributed as in table 1.14.1. To test the hypothesis , 
a sample is taken and its chi-square calculated; in the illustration the 
value was 1.33. Reference to the table shows that, if the null hypothesis 
is true, 1.33 is not an uncommon chi-square, the probability of a greater 
one being about 0.25. As the result of this test, the geneticist would not 
likely reject the null hypothesis. He knows, of course, that he may be in 
error, that the population ratio among the tomato fruits may not be 3 : 1. 
But the discrepancy, if any, is so small that the sample has given no con- 
vincing evidence of it. 

Contrasting with the genetic experiment, the medical example turned 
up x 2 = 4. If the null hypothesis (this disease equally likely in men and 
women) is true, a larger chi-square has a probability of only about 0.05. 
This suggests that the null hypothesis is false, so the sampler would likely 
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reject it. As before, he may be in error because this might be one of those 
5 samples per 100 that have chi-squares greater than *.841 even when the 
sampling is from an equally divided population. In rejecting the null 
hypothesis, the sampler faces the possibility that he is wrong. Such is the 
risk always run by those who test hypotheses and rest decisions on the 
tests. 

The illustrations show that in testing hypotheses one is liable to 
two kinds of error. If his sample leads him to reject the null hypothesis 
when it is true, he is said to have committed an error of the first kind , or a 
Type I error. If, on the contrary, he is led to accept the hypothesis when 
it is false, his error is of the second kind \ a Type II error. The Neyman- 
Pearson theory of testing hypotheses emphasizes the relations between 
these types. For recent accounts of this theory see references (6, 7, 8). 

As a matter of practical convenience, probability levels of 5% (0.05) 
and 1% (0.01) are commonly used in deciding whether to reject the null 
hypothesis. As seen from table 1.14.1, these correspond to y 2 greater 
than 3.841 and yf greater than 6.635, respectively. In the medical exam- 
ple we say that the difference in the number of u'aie and female patients 
is significant at the 5% level,, because it signifies rejection of the null 
hypothesis of equal numbers. 

This use of 5% and 1% levels is simply a working convention. There 
is merit in the practice, followed by some investigators, of reporting in 
parentheses the probability that chi-square exceeds the value found in 
their data. For instance, in the counts of red and yellow tomatoes, we 
found x 2 = 1*33, a value exceeded with probability about 0.25. The re- 
port might read: "The x 2 test was consistent with the hypothesis of a 
3 to 1 ratio of red to yellow tomatoes (P = 0.25).” 

The values of yfi corresponding to a series of probability levels are 
shown below. This table should be used in working the exercises that 
follow. 


Probability of a Greater Value 


P 


0.90 


y 2 1 0.02 

i 


0.-S 

0.10 


0.50 

0.25 

0.10 

0.05 

0 025 

0.010 

0.005 

0.45 

1.32 

2.71 

3.84 

5.02 

6.63 

7.88 


EXAMPLE 115.1 —Two workers A and B perform a task in which carelessness leads to 
minor accidents. In the first 20 accidents, 13 happened to A and 7 to B. Is this evidence 
against the hypothesis that the two men are equally liable to accidents? Compute y 2 and 
find the significance probability. A ns. y 2 = 1 . 8 . P between 0 10 and 0.25. 


EXAMPLE 1.15.2 — A baseball player has a lifetime batting average of 0.280. (This 
means that the probability that he gets a hit when at bat is 0.280.) Starting a new season, he 
gets 1 5 hits in his first 30 times at bat. Is this evidence that he is having what is called a hot 
streak 9 Compute y 2 for the null hypothesis that his probability of hitting is still 0.280. Ans. 
y 2 = 7.20. P < 0.01. Null hypothesis is rejected. 

EXAMPLE 1.15.3 — In some experiments on heredity m the tomato, MacArthur (5) 
counted 3,629 fruits with red flesh and 1,176 with yellow. This was in the f 2 generation 
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where the theoretical ratio was 3:1. Compute % z — 0. 7 1 and find the significance probability. 
MacArthur concluded that “the discrepancies between the observed and expected ratios 
are not significant.” 

EXAMPLE 1.15.4 — In a South Dakota farm labor survey of 1943, 480 of the 1,000 
reporting fanners were classed as owners (or part owners), the remaining 520 being renters. 
It is known that of nearly 7,000 farms in the region, 47% are owners. Assuming this to be 
population percentage, calculate chi-square and P for the sample of 1,000. Ans. x 2 = 0.41, 
P = 0.50. Does this increase your confidence in the randomness of the sampling? Such 
collateral evidence is often cited. The assumption is that if the sample is shown to be repre- 
sentathe for one attribute it is more likely to be representative also of the attribute under 
investigation, provided the two are related. 

EXAMPLE 1.15.5 — James Snedecor (4) tried the effect of injecting poultry eggs with 
female sex hormones. In one series 2 normal males were hatched together with 19 chicks 
which were classified as either normal females or as individuals with pronounced female 
characteristics. What is the probability of the ratio 2 : 19, or one more extreme, in sampling 
from a population with equal numbers of the sexes in which the hormone has no effect? 
Ans. x 2 = 13.76, P is much less than 0.01 . 

EXAMPLE 1.15.6 — In table 1.14.1, there are 62 + 32 + 14 + 3 + 3 = 114 samples 
having chi-squares greater than 0.455, whereas 50% or 230 were expected. What is the prob- 
ability of drawing a more discrepant sample if the sampling is truly random? Ans x 2 
= 0.0174, P — 0.90. Make the same test for your own samples. 

EXAMPLE 1.15.7 — This example illustrates the discontinuity in the distribution of 
chi-square when computed from small samples. From 1 00 samples of size 10 drawn from the 
random digits table A 1, the following frequency distribution of the numbers of odd digits in 
a sample was obtained. 


Number of odd digits 

1 or 9 2 or 8 

3 or 7 

j 4 or 6 

5 

Frequency 

2 8 



19 

46 

J 

25 


Compute the sample frequency distribution of x 2 as m table 1.14.1 and compare it with the 
theoretical distribution. Observe that no sample x 1 occurs in the class interval 0.455-1 .323, 
although 25% of the theoretical distribution lies in this range. 


1.16 — Tests of significance in practice. A test of significance is some- 
times thought to be an automatic rule for making a decision either to 
“accept” or “reject” a null hypothesis. This attitude should be avoided. 
An investigator rarely rests his decisions wholly on a test of significarlce. 
To the evidence of the lest he adds knowledge accumulated from his own 
past work and from the work of others. The size of the sample from which 
the test of significance is cr leuiated is also important. With a small sam- 
ple, the test is likely to produce a significant result only if the null hypothe- 
sis is very badly wrong. An investigator’s report on a small sample test 
might read as follows: “Although the deviation from the null hypothesis 
was not significant, the sample is so small that this result gives only a 
weak confirmation of the null hypothesis.” With a large sample, on the 
other hand, small departures from the null hypothesis can be detected 
as statistically significant. After comparing two proportions in a large 
sample, an investigator may write: “Although statistically significant, 
the difference between the two proportions was too small to be of practical 
importance, and was ignored in the subsequent analysis.” 
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In this connection, it is helpful, when testing a binomial proportion 
at the 5% level, to look at the 95% confidence limits for the population p. 
Suppose that in the medical example the number of patients was only 
n = 10, of whom 4 were female, so that the sample proportion of female 
patients was 0.4. If you test the null hypothesis p = 0.5 by * 2 , you will find 
X 2 = 0.4, a small value entirely consistent with the null hypothesis. 
Looking now at the 95% confidence limits for p t we find from table 1 .4. 1 (p. 
000) that these are 15% and 74%. Any value of the population p lying 
between 15% and 74% is also consistent with the sample result. Clearly, 
the fact that we found a non-significant result when testing the null hy- 
pothesis p = 1/2 gives no assurance from these data that the true p is 
1/2 or near to 1/2. 

LIT — Summary of technical terms. In this chapter you have been 
introduced to some of the main ideas in statistics, as well as to a number of 
the standard technical terms. As a partial review and an aid to memory, 
these terms are described again in this section. Since these descriptions 
are not dictionary definitions, some would require qualification from a 
more advanced viewpoint, but they are substantial! y correct. 

Statistics deals with techniques for collecting, analyzing, and drawing 
conclusions from data. 

A sample is a small collection from some larger aggregate (the 
population) about which we wish information. 

Statistical inference is concerned with attempts to make quantitative 
statements about properties of a population from a knowledge of the 
results given by a sample. 

Attribute data are data that consist of a classification of the members 
of the sample into a limited number of classes on the basis of some 
property of the members (for instance, hair color). In this chapter, only 
samples with two classes have been studied. 

Measurement data are data recorded on some numerical scale. They 
are called discrete when only a restricted number of values occurs (for 
instance, 0, 1, 2, ... 1 1 children). Strictly, all measurement data are dis- 
crete, since the results of any measuring process are recorded to a limited 
number of figures. But measurement data are called continuous if, con- 
ceptually successive values would differ only by tiny amounts. 

A point estimate is a single number stated as an estimate of some quan- 
titative property of the population (for instance, 2.7% defective articles, 
58,300 children under five years). The quantity being estimated is often 
called a population parameter. 

An interval estimate is a statement that a population parameter has 
a value lying between two specified limits (the population contains be- 
tween 56,900 and 60,200 children under five years). 

A confidence interval ib one type of interval estimate. It has the fea- 
ture that in repeated sampling a known proportion (for instance, 95%) 
of the intervals computed by this method will include the population 
parameter. 
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Random sampling , in its simplest form, is a method. of drawing a 
sample such that any member of the population has an equal chance of 
appearing in the sample, independently of the other members that happen 
to fall in the sample. * ’ 

Tables of random digits are tables in which digits 0, 1, 2, ... 9 have 
been drawn by some process that gives each digit an equal chance of 
being selected at any draw. 

The sampled population is the population of which our data are a 
random sample. It is an aggregate such that the process by which we 
obtained our sample gives every member of the aggregate a known chance 
of appearing in the sample, and is the population to which statistical 
inferences from the sample apply. In practice, the sampled population is 
sometimes hypothetical rather th?n real, because the only available data 
may not have been drawn at random from a known population. In 
meteorological research, for instance, the best data might be weather 
records for the past 40 years, which are not a randomly selected sample 
of years. 

The target population is the aggregate about .which the investigator 
is trying to make inferences from his sample. Although this term is not 
in common use, it is sometimes helpful in focussing attention on differ- 
ences between the population actually sampled and the population. that 
we are attempting to study. 

In a frequency distribution , the values in the sahipie are grouped into 
a limited number of classes. A table is made showing the class boundaries 
and the frequencies (number of members of the sample) in each class. 
The purpose is to obtain a compact summary of the data. 

The binomial distribution gives the probabilities that 0, 1, 2, ... n 
members of a sample of size n will possess some attribute, when the sample 
is a random sample from a population in which a proportion p of the 
members possess this attribute. 

A null hypothesis is a specific hypothesis about a population that is 
being tested by means of the sample results. In this chapter the only hy- 
pothesis considered was that the proportion cf the population having some 
attribute has a stated numerical value. 

A test of significance is, in general terms, a calculation by which the 
sample results are used to throw light on the truth or falsity of a null 
hypothesis. A quantity called a test criterion is computed : it measures 
the extent to which the sample departs from the null hypothesis in some 
relevant aspect. If the value of the test criterion falls beyond certain 
limits into a region of rejection , the departure is said to be statistically 
significant or, more concisely, significant . Tests of significance have the 
property that if the null hypothesis is true, the probability of obtaining a 
significant result has a known value, most commonly 0.05 or 0.01. This 
probability is the significance level of the test. 

Chi-square = E (Observed — Expected) 2 /(Expected) is a test criterion 
for the null hypothesis that the proportion with some attribute in the 
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population has a specified value. Large values of chi-square are signifi- 
cant. The chi-square criterion serves many purposes and will appear 
later for testing other null hypotheses. 

Errors of the first and second kinds. In the Neyman-Pearson theory 
of tests of hypotheses, an error of the first kind is the rejection of the null 
hypothesis when it is true, and an error of the second kind is the acceptance 
of a null hypothesis that is false. In practice, in deciding whether to re- 
ject a null hypothesis or to regard it as provisionally true, all available 
evidence should be reviewed as well as the specific result of the test of 
significance. 
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2,1 — Normally distributed population. In the first chapter, sampling 
was mostly from a population with only two kinds of individuals ; odd or 
even, alive or dead, infested or free. Random samples of n from such a 
population made up a binomial distribution. The variable, an enumera- 
tion of successes, was discrete. Now we turn to another kind of population 
whose individuals are measured for some characteristic such as height or 
yield or income. The variable flows without a break from one individual 
to the next — a continuous variable with no limit to the number of indi- 
viduals with different measurements. Such variables are distributed in 
many ways, but we shall be occupied first with the normal distribution. 

Next to the binomial, the normal distribution was the earliest to be 
developed. De Moivre published its equation in 1733, twenty years after 
Bernoulli had given a comprehensive account of the binomial. That the 
two are not unrelated is clear from figure 2.1.1. On the top is the graph 
of a symmetrical binomial distribution similar to that in figure 1.9.1. In 
this new figure the sample size is 48 and the population sampled has equal 
numbers of the two kinds of individuals. Although discrete, the binomial 
is here graphed as a histogram. That is, the ordinate at 25 successes is 
represented by a horizontal bar going from 24.5 to 25.5. This facilitates 
comparison with the continuous normal curve. An indefinitely great 
number of samples were drawn so that the frequencies are expressed as 
percentages of the total. Successes less than 1 3 and more than 35 do occur, 
but their frequencies are so small that they cannot be shown on the graph. 

Imagine now that the size of the sample is increased without limit, the 
width of the intervals on the horizontal axis being decreased correspond- 
ingly. The steps of the histogram would soon become so small as to look 
like the continuous curve at the right. Indeed, De Moivre discovered the 
normal distribution when seeking an approximation to the binomial. The 
discrete variable has become continuous and the frequencies have merged 
into each other without a break. 

Thi$ normal distribution is completely determined by two constants 
or parameters. First, there is the mean , p, which locates the center of the 
distribution. Second, the standard deviation, cr, measures the spread or 
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Fig 2 1.2 — Solid curve: the normal distribution with n = 0 and a = 1. Dotted 
curve, the normal distribution with fi — 0 and o = 1.5. 

variation of the individual measurements; in fact, a is the scale (unit of 
measurement) of the variable which is normally distributed 

From the figure you see that within one sigma on either side of j.i the 
frequency is decreasing ever more rapidly but beyond that point it de- 
creases at a continuously lesser rate. By the time the variable, X\ has 
reached ±3 cj the percentage frequencies are negligibly small. Theoret- 
ically, the frequency of occurrence never vanishes entirely, but it ap- 
proaches zero as X increases indefinitely. The concentration of the 
measurements close to jx is emphasized by the fact that over 2/3 of the 
observations lie in the interval n±c r while some 95% of them are in the 
interval \i ± 2a. Beyond ± 3a lies only 0.26% of the total frequency. 

The formula for the ordinate or height of the normal curve is 

y = 1 e -(X-ft)2/2 

a*j2n 

where the quantity e = 2.3026 is the base for natural logarithms and n is 
of course 3.1416. To illustrate the role of the standard deviation a in 
determining the shape of the curve, figure 2.1.2 shows two curves. The 
solid curve has /z = 0, a — 1, while the dotted curve has fi — 0, a = 1.5. 
The curve with the larger a is lower at the mean and more spread out. 
Values of X that are far from the mean are much more frequent with 
(7=1.5 than with or = 1. In other words, the population is more variable 
with a = 1.5. A curve with a = 1/2 would have a height of nearly 0.8 at 
the mean and would have scarcely any frequency beyond X = 1.5. 
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To indicate the effect of a change in the mean /x, the curve with /x = 2, 
<r = 1 is obtained b> lifting the solid curve bodily and centering it at 
V = 2 without changing its shape in any other way. This explains why /x 
is called the parameter of location. 

2.2 — Reasons for the use of the normal distribution. You may be 
wondering why such a model is presented since it obviously cannot de- 
scribe any real population. It is astonishing that this normal distribution 
has dominated statistical practice as well as theory. Briefly, the mam 
reasons are as follows: 

1. Convenience certainly plays a part. The normal distribution has 
been extensively and accurately tabulated, including many auxiliary re- 
sults that flow from it. Consequently if it seems to apply fairly well to a 
problem, the investigator has many time-saving tables ready at hand. 

2. The distributions of some variables are approximately normal, 
such as heights of men, lengths of ears of corn, and, more generally, many 
linear dimensions, for instance those of numerous manufactured articles. 

3. With measurements whose distributions are not normal, a simple 
transformation of the scale of measurement may induce approximate 
normality. The square root, yJX, and the logarithm, log X, are often 
used as transformations in this way. The scores made by students m 
national examinations are frequently rescaled so that they appear to fol- 
low a normal curve, 

4. With measurement data, many investigations have as their purpose 
the estimation of averages — the average life of a battery, the average in- 
come of plumbers, and so on. Even, if the distribution in the original 
population is far from normal, the distribution of sample averages tends 
to become normal, under a wide variety of conditions, as the size of 
sample increases. This is perhaps the single most important reason for the 
use of the normal. 

5. Finally, many results that are useful in statistical work, although 
strictly true only when the population is normal, hold well enough for 
rough-and-ready use when samples come from non-normal populations. 
When presenting such results we shall try to indicate how well they stand 
up under non-normality. 

2.3 — Tables of tbe normal distribution. Since the normal curve de- 
pends on the two parameters /x and ex, there are a great many different 
normal curves. All standard tables of this distribution are for the dis- 
tribution with /x = 0 and a = 1 . Consequently if you have a measurement 
X with mean \x and standard deviation o and wish to use a table of the 
normal distribution, you must rescale X so that the mean becomes 0 and 
the standard deviation becomes 1. The rescaled measurement is given 
by relation 

« X - ii 


<7 
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The quantity Z goes by various names — a standard normal variate , a 
standard normal deviate , a normal variate in standard measure , or, in educa- 
tion and psychology, a standard score (although this term sometimes has 
a slightly different meaning). To transform back from the Z scale to the 
X scale, the formula is 

X = ji + crZ 

There are two principal tables. 

Table of ordinates. Table A 2 (p. 547) gives the ordinates or heights of the 
standard normal distribution. The formula for the ordinate is 



Z 2 ,'2 


These ordinates are used when graphing the normal curve. Since the 
curve is symmetrical about the origin, the heights are presented only for 
positive values of Z. Here is a worked example. 

EXAMPLE I - Suppose that we wish to sketch the normal eur\e for a variate X that 
has p = 3 and v — 1.6. What is the height of this curve at X = 2? 

Step 1. Find Z = (2 - 3)0.6 = -0.625. 

Step 2. Read the ordinate in table A 2 for Z - 0.625. In the table, the Z entries are given 
to tw r o decimal places only. ForZ = 0.62 the ordinate is 0.3292 and for Z - 0.63 the ordinate 
is 0.3271. Hence we take 0.328 for Z - 0.625. 

Step 3. Finally, divide the ordinate 0.328 by o\ getting 0.328 1 .6 = 0.205 as the answer. 
This step is needed because if >ou look back at the formula in section 2.1 for the ordinate 
of the general normal curve, vou will see a o in the denominator that does not appear in the 
tabulated curve. 

Table of the cumulative distribution . Table A 3 (p. 548) is much more 
frequently used than Table A 2. This gives, for any positive value of Z, 
the area under the curve from the origin up to the point Z. It shows, for 
any positive Z, the probability that a variate drawn at random from the 
standard normal distribution will have a value lying between O and Z. 
The word cumulative is used because if we think of the frequency dis- 
tribution of a very large sample, with many classes, the area under the 
curve represents the total or cumulative frequency in all classes lying be- 
tween O and Z, divided by the total sample sizb so as to give a cumulative 
relative frequency. In the limit, as the sample size increases indefinitely, 
this becomes the probability that a randomly drawn member lies between 
O and Z. 

As a reminder the area tabulated m Table A 3 is shown in figure 2.3.1. 
Since different people have tabulated different types of area under the 
normal curve, it is essential, when starting to use any table, to understand 
clearly what area has been tabulated. 

First, a quick look at table A 3. At Z = 0 the area is, of course, zero. 
At Z - 3.9, or any larger value, the area is 0.5000 to four decimal places. 
It follows that the probability of a value of Z lying between —3.9 and 



0 


Ho. 2.1.1 The shaded area is the area tabulated m table A 3 for positive values ot / 

+ 3.9 is 1.0000 to four decimals, remembering that the curve is sym- 
metrical about the origin. This means that any value drawn from a stan- 
dard normal distribution is practically certain to lie between - 3.9 and 
+ 3.9. At Z = 1.0, the area is 0.3413. Thus the probability of a value 
lying between — 1 and + 1 is 0.6826. This verifies a previous remark 
(section 2. 1 ) that over 2/3 of the observations in a general norma! distribu- 
tion lie in the interval \x ± a, Similarly, for Z = 2 the area is 0.4772, cor- 
responding to the result that about 95% of the observations (more ac- 
curately 95.44%) will lie between \i — 2<r and ^ + 2<r. 

When using table A 3 you will often want probabilities represented 
by areas different from those tabulated. If A is the area in table A 3, the 
following table shows how to obtain the probabilities most commonly 
needed. 

TABLE 2.3.1 

Formulas for Finding Probabilities Related to the Normal Distribution 


Probability of a Value * Formula 


( 1 ) Lying between O and Z A 

(2) Lying between — Z and Z 2 A 

(3) Lying outside the interval ( — Z, Z) 1 2 A 

(4) Less than Z (Z positive) 0 5 + A 

(5) Less than Z (Z negative) 0.5 - A 

(6) Greater than Z (Z positive) 0.5 - A 

(7) Greater than Z (Z negative) 0.5 + A 


Verification of these formulas is left as an exercise. A few more 
complex examples will be worked : 

EXAMPLE 2 — What is the probability that a normal deviate lies between - 1.62 and 
+ 0.28? We have to split the interval into two parts: from - 1.62 to 0, and from 0 to 0.28. 
From table A 3, the areas for the two parts are, respectively, 0.4474 and 0.1103, giving 0,5577 
as the answer. 
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EXAMPLE 3- Wiidt is the probability that a normal deviate lies between -2.67 and 
- 0.59 ? In this case we take the area from - 2.67 to 0, namely 0.4972, and subtract from it the 
area from —0.59 to 0, namely 0.2224, giving 0.2748. 

EXAMPLE 4 — The heights of a large sample of men were found to be approximately 
normally distributed with mean = 67.56 inches and standard deviation = 2.57 inches. 
What proportion of the men have heights less than 5 feet 2 inches? We must first find Z. 


Z = 



62 - 67,56 
2.57 ~ 


-2.163 


The probability wanted is the probability of a value less than Z, where Z is negative. We 
use formula (5) in table 2.3.1. Reading table A 3 at Z — 2.163, we get A — 0.4847, interpolat- 
ing mentally between Z = 2.16 and Z = 2.17. From formula (5), the answer is 0.5 - A , 
or 0.0153. About 1*2% of the men have heights less than 5 ft. 2 in. 

EXAMPLE 5 — What height is exceeded by 5 0 o of the men 0 The first step is to find Z 

we use formula (6) in table 2.3.1, writing 0.5 - A = 0.05, so that A « 0.45. We now look 
in table A 3 for the value of Z such that A = 0.45. The value is Z = 1 .645. Hence the actual 
height is 

Y = /£ + gZ = 67.56 + (2.57)0.645) = 71.79 inches, 


just under 6 feet. 

Some examples to be worked by the reader follow : 

EXAMPLE 2.3.1 — Using table A 2, (i) at the origin, what is the height of a normal curve 
with cr = 2? (ii) for any normal curve, at what value of X is the height of the curve one-tenth 
of the height at the origin? Ans. (i) 0. 1994 ; (ii) at the value X — pi + 2. 1 5a. 

EXAMPLE 2.3.2 — Using table A 3, show that 92.16% of the items in a normally dis- 
tributed population lie between - 1.76a and + 1.76a. 

EXAMPLE 2.3.3 — Show that 65.24° „ of the items in a normal population lie between 
pi — 1 . Icr and pi + 0.8cr. 

EXAMPLE 2.3.4 — Show that 13.59% of the items lie between Z = 1 and Z = 2. 

EXAMPLE 2.3.5 — Show that half the population lies in the interval from pi - 0.6745a 
and fj, + 0.6745a. The deviation 0.6745a, formerly much used, is called the probable error 
of X. Ans. You will have to use interpolation. You are seeking a value of Z such that the 
area from O to Z is 0.2500. Z = 0.67 gives 0.2486 and Z = 0.68 gives 0.25 17. Since 0.2500 
- 0.2486 = 0.0014, and 0.2517 — 0.2486 = 0.0031, we need to go 14/31 of the distance 
from 0,67 to 0.68. Since 14/31 = 0.45, the interpolate is Z = 0.6745. 

EXAMPLE 2.3.6— Show that 1% of the population lies outside the limits Z — ± 2.575 

EXAMPLE 2.3.7 — For the heights of men, with pi ~ 67.56 inches and a = 2.57 inches, 
what percentage of the population has heights lying between 5 feet 5 inches and 5 feet 10 
inches? Compute your Z’s to two decimals only. Ans. 67%. 

EXAMPLE 2.3.8 — The specification for a manufactured component is that the pres- 
sure at a certain point must not exceed 30 pounds. A manufacturer who would like to enter 
this market finds that he can make components with a mean pressure pi — 28 lbs., but the 
pressure varies from one specimen to another with a standard deviation a = 1.6 lbs. What 
proportion of his specimens will fail to meet the specification? Ans. 10.6%. 

EXAMPLE 2.3.9— By quality control methods it may be possible to reduce a in the 
previous example while keeping pi at 28 lbs. If the manufacturer wishes only 2% of his 
specimens to be rejected, what must his a be? Ans. 0.98 lbs. 
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2.4 — Estimators of p and a. While p and a are seldom known, they 
may be estimated from random samples. To illustrate the estimation of 
the parameters, we turn to the data reported from a study. In 1936 the 
Council on Foods of the American Medical Association sampled the 
vitamin C content of commercially canned tomato juice by analyzing a 
specimen from each of the 1 7 brands that displayed the seal of the Council 
(1). The vitamin C concentrations in mg. per 100 gm. are as follows 
(slightly altered for easier use): 

16, 22, 21, 20, 23, 21, 19, 15, 13, 23, 17, 20, 29, 18, 22, 16, 25 

Estimation of p. Assuming random sampling from a normal popula- 
tion, p is estimated by an average called the mean of the sample or, more 
briefly, the sample mean. This is calculated by the familiar process of 
dividing the sum of the observations, X, by their number. Representing 
the sample mean by X, 

X = 340/17 = 20 mg. per~100 grams of juice 

The symbol, X is often called “bar-X” or “X-barT We say that this 
sample mean is an estimator of p or that p is estimated by it. 

Estimation of o. The simplest estimator of a is based on the range of 
the sample observations, that is, the difference between the largest and 
smallest measurements. For the vitamin C data, 

range = 29 — 13 = 16 mg./lOO gm. 

From the range, sigma is estimated by means of a multiplier which de- 
pends on the sample size. The multiplier is shown in the column headed 
“cr/Range” in table 2.4.1 (2, 3). For n =■= 17, halfway between 16 and 18, 
the multiplier is 0.279, so that 

<r is estimated by (0.279)(16) ~ 4.46 mg./ 100 gm. 

Looking at table 2.4. 1 you ill notice that the multiplier decreases as 
n becomes larger. This is because the sample range tends to increase as 
the sample size merer ses, although the population o remains unchanged. 
Clearly if we start with a sample of size 2 and keep adding to it, the range 
must either stay constant or go up with each addition. 

Quite easily, then, we have made a point estimate of each parameter of 
a normal population; these estimators constitute a summary of the infor- 
mation contained in the sample. The sample mean cannot be improved 
upon as an estimate of p, but we shall learn to estimate a more efficiently. 
Also we shall learn about interval estimates and tests of hypotheses. Be- 
fore doing so, it is worthwhile to examine our sample in greater detail. 

The first point to be clarified is this: What population was repre- 
sented by the sample of 17 determinations of vitamin C? We raised this 
question tardily; it is the first one to be considered in analyzing any sam- 
pling. The report makes it clear that not all brands were sampled, only 
the seventeen allowed to display the seal of the Council. The dates of the 
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TABLE 2.4.1 

Ratio of a to Range in Samples of n From the Normal Distribution. Efficiency 
of Range as Estimator of <t. Number of Obser\ ations With 
Range To Equal 100 With i 


n 

<r 

Range 

Relative 

Efficiency 

Number 1 
per 100 

i 

n 

a 

Range 

Relative 

Efficiency 

Number 
per 100 

2 

0.886 

1.000 

100 1 

12 

0.307 

0.815 

123 

3 

591 

0.992 

101 

14 

.204 

.783 

128 

4 

.486 

.975 

103 

16 

.283 

.753 

133 

5 

.430 

.955 

105 

18 

.275 

.726 

138 

6 

.395 

.933 

107 

20 

.268 

.700 

143 

7 

.370 

.912 

110 

30 

.245 

.604 

166 

8 

.351 

.890 

112 

40 

.231 

.536 

186 

9 

.337 

.869 

115 

50 

.222 

.49 

204 

10 

325 

.850 

118 






packs were mostly August and September of 1936, about a year before the 
analyses were made. The council report states that the vitamin concentra- 
tion “may be expected to vary according to the variety of the fruit, the 
conditions under which the crop has been grown, the degree of ripeness 
and other factors.’’ About all that can be said, then, is that the sampled 
population consisted of those year-old containers still available to the 17 
selected packers. 

2.5 — The array and its graphical representation. Some of the more 
intimate features of a sample are shown by arranging the observations in 
order of size, from low to high, in an array. The array of vitamin contents 
is like this: 


13, 15, 16, 16, 17, 18, 19, 20, 20, 21, 21, 22, 22, 23, 23, 25, 29 


For a small sample the array serves some of the same purposes as the fre- 
quency distribution of a large one. 

The range, from 1 3 to 29, is now obvious. Also, attention is attracted 
to the concentration of the measures near the center of the array and to 
their thinning out at the extremes. In this way the sample may reflect the 
distribution of the normal population from which it was drawn. But the 
smaller the sample, the more erratic its reflection may be. 

In looking through the vitamin C contents of the several brands, one is 
struck by their variation. What are the causes of this variation ? Different 
processes of manufacture, perhaps, and different sources of the fruit. 
Doubtless, also, the specimens examined, being themselves samples of 
their brands, differed from the brand means. Finally, the laboratory 
technique of evaluation is never perfectly accurate. Variation is the 
essence of statistical data. 

Figure 2.5.1 is a graphical representation of the foregoing array of 17 
vitamin determinations. A dot represents each item. The distance of the 
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Fig. 2.5.1™ Graphical representation of an arrav . Vitamin C data. 


dot from the vertical line at the left, proportional to the concentration 
of ascorbic acid in a brand specimen, is read in milligrams per 100 grams 
on the horizontal scale. 

The diagram brings out vividly not only the variation and the con- 
centration in the sample, but also two other characteristics: (i) the rather 
symmetrical occurrence of the values above and below the mean, and 
(ii) the scarcity of both extremely small and extremely large vitamin C 
contents, the bulk of the items being near the middle of the set. These 
features recur with notable persistence in samples from normal distribu- 
tions. For many variables associated with living organisms there are 
averages and ranges peculiar to each, reflecting the manner in which each 
seems to express itself most successfully. These norms persist despite the 
fact that individuals enjoy a considerable freedom in development. A 
large part of our thinking is built around ideas corresponding to such 
statistics. Each of the words, pig , daisy, man , raises an image which is 
quantitatively described by summary numbers. It is difficult to conceive 
of progress in thought until memories of individuals are collected into 
concepts like averages and ranges of distributions. 

2.6 — Algebraic notation. The items in any set may be represented by 

\\, x 2 , . . . X n , 

where the subscripts 1,2, . n, may specify position in the set of n items 

(not necessarily an array). The three dots accompanying these symbols 
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are read “ and so on ” Matching the symbols with the values in section 2.4, 
X x = 16, X 2 = 22, . . . X xl = 25 mg./lOO gm. 

The sample mean is represented by X, so that 

X = (X x +X 2 + ... X n )/n 
This is condensed into the form, 

T - 

where X stands for every item successive!}. The symbol, ZX, is read 
“summation X" or “sum of the XX Applying this formula to the vitamin 
C concentrations, 

IX = 340, and X = 340/17 = 20 mg./lOO gm. 

2.7 — Deviations from sample mean. The individual variations of 
the items in a set of data may be well expressed by the deviations of these 
items from some centrally located number such as the sample mean. 
For example, the deviation- f ro ni -mean of the first Z-value is 

16 — 20 = -4 mg. per 100 gm. 

That is, this specimen falls short of X by 4 mg./lOO gm. Of special interest 
is the whole set of deviations calculated from the array in section 2.5: 

— 7, —5, —4, —4, -3, -2, -1,0,0, 1, 1,2, 2,3, 3, 5,9 

These deviations are represented graphically in figure 2.5.1 by the dis- 
tances of the dots from the vertical line drawn through the sample mean. 

Deviations are almost as fundamental in our thinking as are averages. 
“What a whale of a pig” is a metaphor expressing astonishment at the 
deviation of an individual’s size from the speaker’s concept of the normal. 
Gossip and news are concerned chiefly with deviations from accepted 
standards of behavior. Curiously, interest is apt to center in departures 
from norm, rather than in that background of averages against which the 
departures achieve prominence. Statistically, freaks are freaks only 
because of their large deviations. • 

Deviations are represented symbolically by lower case letters. That 
is: 


X\ 
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Just as X may represent any of the items in a set, or all of them in succes- 
sion, so x represents deviations from sample mean. In general, 

x = X ~X 

It is easy to prove the algebraic result that the sum of a set of de- 
viations from the moan is zero; that is, Ex = 0. Look at the set of de- 
viations x x = X x - X, and so on (foot of p. 42). Instead of adding the col- 
umn of values x t we can obtain the same result by addingjhe column of 
values and subtracting the sum of the column of values X. The sum of 
the column of values X l is the expression EX. Further, sincethere are n 
items in a column, the sum of the column of values X is just rtX x Thus we 
have the result 


Ex = EX - nX 

But the mean X — EX/n , so that nX = EX , and the right-hand side is 
zero. It follows from this theorem that the mean of the deviations is also 
zero. 

This result is useful in proving several standard statistical formulas. 
When it is applied_to a specific sample of data, there is a slight snag. If 
the sample mean X does not come out exactly, we ha\e to round it. As a 
result of this rounding, the numerical sum of the deviations will not be 
exactly zero. Consider a sample with the values 1, 7, 8. The mean is 
16/3, which we might round to 5.3. The deviations are then -4.3, 4- 1 .7, 
and + 2.7, adding to +0.1. Thus in practice the sum of the deviations is 
zero, apart from rounding errors. 

EXAMPLE 2.7.1 - The weights of 12 stammate hemp plants m earl} April at College 
Station, Texas (9), were approximately : 

13, 1 1, 16, 5, 3, 18, 9, 9, 8, 6, 27, and 7 grams 

Array the weights and represent them graphically. Calculate the sample mean, 1 1 gums, 
and the deviations therefrom. Verify the fact that X v = 0. Show' that a is estimated by ~.4 
grams. 

EXAMPLE 2.7.2- The heights of 1 1 men are 64, 70, 65, 69, 68, 67, 68, 67, 66, '*'2 and 
61 inches. Compute the sample mean and verify it by summing the deviations Arc the 
numbers of positive and negative deviations equal, or only their sums? 

EXAMPLE 2.7.3 — The weights of 11 fort} -year-old men were 148, 154, 158, 160 , 161 , 
162, 166, 170, 182, 195, and 236 pounds. Notice the fact that only three of the weights 
exceed the sample mean. Would you expect weights of men to be normally distributed? 

EXAMPLE 2.7.4 — In a sample of 48 observations you are told that the standard devia- 
tion has been computed and is 4.0 units. Glancing through the data, you notice that the 
lowest observation is 39 and the highest 76. Does the reported standard deviation look 
reasonable? 

EXAMPLE 2.7.5 Ten patients troubled with sleeplessness each received a nightly 
dose of a sedativ e for one period, w hile in another period they received no sedative (4) The 
a\ erage hours of sleep per night for each patient during each two-week period are as follow > 
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Patient 1 2 3 4 5 6 7 8 91 0 


Sedative 1.3 1 1 6.2 3 6 4.9 1 4 6 6 4 5 4.3 6 1 

None 0.6 1.1 2.5 2.8 2 9 3 0 3 2 4.7 5.5 6 2 


Calculate the 10 differences (Sedative - None) Might these differences be a sample from 
a normal population of differences 9 How would you describe this population"' (Y ou might 
want to ask for more information.) Assuming that the differences are normally distributed, 
estimate p and cr for the population of differences. Ans. +0.75 hours and 1.72 hours 

EXAMPLE 2.7.6— If you have two sets of data that are paired as in the preceding 
example, and if you have calculated the resulting set of differences, prove algebraically that 
the sample mean of the differences is equal to the difference between the sample means of the 
two sets. Verify this result for the data m example 2 7.5. 

2.8 — Another estimator of cr; the sample standard deviation. The 
range, dependent as it is on only the two extremes in a sample, usually has 
a more variable sampling distribution than an estimator based on the 
whole set of deviations-from-mean in a sample, not just the largest and 
smallest. What kind of average is appropriate to summarize these devia- 
tions, and to estimate a with the least sampling variation? 

Clearly, the sample mean of the deviations is useless as an estimator 
because it is always zero. But a natural suggestion is to ignore the signs, 
calculating the sample mean of the absolute values of the deviations. The 
resulting measure of variation, the mean absolute deviation , had a consider- 
able vogue in times past. Now. however, we use another estimator, more 
efficient and more flexible. 

The sample standard deviation . This estimator, denoted by s, is the most 
widely used m statistical work. The formula defining s is 

^ _ p(T^W _ jYx 2 
V n ~ 1 \ n — 1 

First, each deviation is squared. Next, the sum of squares, Z.v 2 , is divided 
by (n — 1 ), one less than the sample size. The result is the mean square 
or sample lariance . ,v 2 . Finally, the extraction of the square root recovers 
the original scale of measurement. For the vitamin C concentrations, the 
calculations are set out in the right-hand part of table 2 8.1. Since the 
sum of squares of the deviations is 254 and n is P, we have 

s 2 = 254/16 -- 15 88 

s - ^05.88 - 3.98 mg. 100 gin 

Before fuithei discussion ol s is gnen, its calcuUtion should be fixed 
in mmd b\ wanking a couple of examples Table A 18 is a table of square 
roots Hints on finding square roots arc gi\en on p '4! 
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EXAMPLE 2.8.1— In five patients with pneumonia, treated with sodium penicillin G, 
the numbers of days required to bring the temperature down to normal were 1 4 S, 7, 3 
Compute s for these data and compare it with the estimate based on the range. A ns \ = 2 24 
days Range estimate = 2.58 days. 

EXAMPLE 2.8.2— Calculate s for the hemp plant weights m example 2.7.1 Ans. 6 7 
grams. Compare with >our first estimate of <x. 

The appearance of the divisor (n — 1) instead of n in computing .? 2 
and s is puzzling at first sight. The reason cannot be explained fully at 
this stage, being related to the computation of s from data of more com- 
plex structure. The quantity (n — 1) is called the number of degrees of 
freedom in s. Later in the book we shall meet situations in which the 
number of degrees of freedom is neither n nor in - 1), but some other 
quantity . F the practice of using the degrees of freedom as diusor is fol- 
lowed, there *$ the considerable advantage that the same statistical tables, 
needed in important applications, serve for a wide variety of types of data. 

Division by (n - 1 ) has one standard property that is often cited. If 
random samples are drawn from any indefinitely large population (not 
just a normally distributed one) that has a finite value of o*, then the average 
value of.? 2 , taken over all random samples, is exactly equal to o 2 . Any 
estimate whose average value over all possible random samples is equal 
to the population parameter being estimated is called unbiased . Thus, 
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s 2 is an unbiased estimate of a 2 . This property, which says that on the 
average the estimate gives the correct answer, seems a desirable one for an 
estimate to possess. The property, however, is not as fundamental as 
one might think, because s is not an unbiased estimate of a. If we want 
s to be an unbiased estimate of a in normal populations, we must use a 
divisor that is neither (n - 1) nor n. 

23 — Comparison of the two estimators of cr. Y ou now have two esti- 
mators of cr, one of them easier to calculate than the other, but less efficient. 
You need to know what is meant by “less efficient” and what governs the 
choice of estimate. Suppose that we draw a large number of random 
samples of size 10 from a normal population. For each sample we can 
compute the estimate of a obtained from the range, and the estimate 5 . 
Thus we can form two frequency 'distributions, one showing the distribu- 
tion of the range estimate* the other showing the distribution of s> The 
distribution of s is found to be more closely grouped about cr; that is, s 
usually gives a more accurate estimate of a. Going a step further, it can 
be shown that the range estimate, computed from normal samples of size 
12, has roughly the same frequency distribution as that of s in samples of 
size 10. We say that in samples of size 1 0 the relative efficiency of the range 
estimator to s is about 10/12, or more accurately 0.850. The relative 
efficiencies and the relative sample sizes appear in the third and fourth 
columns of table 2.4.1 (p. 40). In making a choice we have to weigh the 
cost of more observations. If observations are costly, it is cheaper to 
compute 5 . 

Actually, both estimators are extensively used. Note that the rela- 
tive efficiency of the range estimator remains high up to samples of sizes 
8 to 10. In many operations, a is estimated in practice by combining the 
estimates from a substantial number of small samples. For instance, in 
controlling the quality of an industrial process, small samples of the manu- 
factured product are taken out and tested frequently, say every 15 min- 
utes or every hour. Samples of size 5 are often used, the range estimator 
being computed from each sample and plotted on a time-chart. The 
efficiency of a single range estimate in a sample of size 5 is 0.955, and the 
average of a series of ranges has the same efficiency. 

The estimate from the range is an easy approximate check on the 
computation of s. In these days, electronic computing machines are used 
more and more for routine computations. Unless the investigator has 
learned how to program, one consequence is that the details of his com- 
putations are taken out of his hands. Errors in making the programmers 
understand what is wanted and errors in giving instructions to the ma- 
chines are common. There is therefore an increasing need for quick 
approximate checks on all the standard statistical computations, which the 
investigator can apply when his results are handed to him. If a table of 
cr/ Range is not at hand, two rough rules may help. For samples up to size 
10, divide the range by Jn to estimate a. Remember also: 
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If n is near 

Then 0 is roughly estimated 

this number 

by dividing range by 

5 

2 

10 

3 

25 

4 

100 

5 


The range estimator and 5 are both sensitive to gross errors, because 
a gross error is likely to produce a highest or lowest sample member that is 
entirely false. 

EXAMPLE 2.9.1 — In a sample of size 2, with measurements X v and X 2 , show that 1 is 
|^i - * 2 1 A/2 = 0 70 7 | X\ - A' 2 |, and that the range estimator is 0.886) X x - X 2 \, where the 
vertical lines denote the absolute value. The reason for the different multipliers is that the 
range estimator is constructed to be* an unbiased estimator of a, while s is not, as already 
mentioned. 

EXAMPLE 2.9.2— The birth weights of 20 guinea pigs were: 30, 30, 26, 32, 30, 23, 29, 
31, 36, 30, 25, 34, 32, 24, 28, 27, 38, 31, 34, 30 grams. Estimate a in 3 ways: (i) by the rough 
approximation, one-fourth of the range (Ans. 3.8 gm.); (ii) by use of the fraction, 0.268, m 
table 2.4.1 (Ans. 4.0 gm.); (iii) by calculating s (Ans. 3.85 gm.). N.B. : Observe the time re- 
quired to calculate j. 

EXAMPLE 2.9.3 — In the preceding example, how many birth weights would be re- 
quired to yield the same precision if the range were used instead of 5 ? Ans. about 29 weights. 

EXAMPLE 2.9.4 — Suppose you lined up according to height 16 freshmen, then mea- 
sured the height of the shortest, 64 inches, and the tallest, 72 inches. Would >ou accept the 
midpoint of the range, (64 + 72)/2 = 68 inches as a rough estimate of /x, and 8 3 = 2.7 
inches as a quick-and-easy estimate of <7 7 

EXAMPLE 2.9.5 - In a sample of 3 the values are, increasing ordqr, X u X 2 , and ,V v 
The range estimate of o is 0.59 1( A 3 - X x ). If you are ingenious at algebra, show that i 
always lies between (A' 3 — X x )!2 = 0.5(.V 3 - A",), and ( J 3 - V x ) /X 3, = 0 578(X 3 - A r x ). 
Verify the two extreme cases from the samples 0, 3, 6, in which s = 0.5( X 3 - X t ) and 0, 0, 6, 
in which s ~ 0.578(^3 - AY). 

2,10 — Hints on the computation of s. Two results in algebra help to 
shorten the calculation of,?. Both give quicker ways of finding Iv 2 . If 
G is any number, there is an algebraic identity to the effect that 

I.Y 2 - I(.V - X) 2 = 1(X - G) 2 - [XX - nGf!n 

An equivalent alternative form is 

Sa* 2 - UX - X) 2 = Z(A' - G) 2 - MX - G) 2 

These expressions are useful when ,s has to be computed without the aid of 
a calculating machine (a task probably confined mainly to students nowa- 
days). Suppose the sample total is HX = 350 and n = 17. The mean A 
is 350 17 = 20.59 If the X's aie whole numbers, it is troublesome to take 
deviations from a number like 20.59, and still more so to squaie the 
numbers without a machine. Tb • ek is to take G (sometimes called the 
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guessed or working mean) equal to 20. Find the deviations of the T’s 
from 20 and the sum of squares of these deviations, X(T - G) 2 . To 
get Xx 2 . you have only to subtract n times the square of the difference 
between X and G, or, in this case, 17(0.59) 2 = 5.92. 

Proof of the identity. We shall denote a typical value in the sample by 
X v where the subscript i goes from 1 to n. Write 

X t - G = (X t - X) + (X — G) 

Squaring both sides, we have 

(T t - G ) 2 = (X t - X) 1 + 2(X t - X)(X - G) + (X - G ) 2 

We now add over the n_members of the sample. In the middle term on 
the right, the term 2(X - G) is a constant multiplier throughout this 
addition, since this term does not contain the subscript i that changes from 
one member of the sample to another. Hence 

X2(T, - X)(X - G) = 2(X - G)I(T, - X) = 0, 

since, as we have seen previously, the sum of the deviations from the sam- 
ple mean is always zero. This gives 

X(T, - G) 2 - Z(T, - X) 2 + n(X - G) 2 

noting that the sum of the constant term (X — G) 2 over the sample is 
n(X - G) 2 . Moving this term to the other side, we get 

X( X x - G) 2 - n(X - G) 2 « X(T, - X) 2 

This completes the proof. 

Incidentally, the result shows that for any_value of G, X(T, — X) 2 
is always smaller than X(T, — G) 2 , unless G = T. The sample mean has 
the property that the sum of squares of deviations from it is a minimum. 

The second algebraic result, a particular case of the first, is used 
when a calculating machine is available. Put G = 0 in the first result 
m this section. We get 

Xx 2 = X(T-T) 2 = XX 2 - (XT) 2 ,/? 

This result enables us to find Xx 2 without computing any of the deviations. 
For a set of positive numbers X r most calculating machines will compute 
the sum of squares, XT 2 , and the sum, XT, simultaneously, without 
writing down any intermediate figures. To get Xx 2 , we square the sum, 
dividing by /?, to give (X.V) 2 h, and subtract this from the original sum of 
squares, XT 2 . The computation will be illustrated for the 17 vitamin C 
concentrations. Earlier, as mentioned, these data were altered slightly to 
simplify the presentation. The actual determinations were as follows. 

16, 22, 21, 20, 23, 22, 17, 15, 13, 22, 1 7, 18, 29, 17, 22, 16, 23 

The only figure 4 neo ' »e written down are shown in table 2.10.1. 
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TABLE 2.10.1 

Computing the Sample Mean and Sum of Squares of Deviations 
With a Calculating Machine 


«=17 XX 2 ~ 6,773 

XX — 333 {XX) 2 m - 6,522.88 

X = 19.6 mg. per 100 gm. Xx 2 - 250.12 

s 2 = 250.12/16 = 15.63 
5= v 15.63- 3.95 


When using this method, remember that any constant number can 
be subtracted from all the X { without changing s. Thus if your data are 
numbers like 1032, 1017, 1005, and so on, they can be read as 32, 17. 5, 
and so on, when following the method in table 2. 10. 1 

EXAMPLE 2.10.1 — For those who need practice m using a guessed mean, here is a set 
of numbers for easy computation: 

15, 12, 10, 10, 10, 8, 7, 7, 4, 4, 1 

First calculate X — 8 and s = 4 by finding deviations from the sample mean. Then tr> 
various guessed means, such as 5, 10, and 1 . Continue until you convince yourself that the 
answers, X = 8 and s ~ 4, can be reached regardless of the value chosen for G. Finally . 
try G = 0. Note: With a guessed mean, X can be found without having to add the X r hy 
the relation 

X~G+ [SIX’ -(?)]» 

where the quantity X(X - G ) is the sum of your deviations from the guessed mean G 

EXAMPLE 2.10.2 — For the ten patients m a previous example, the average differences 
m hours of sleep per night between sedative and no sedative were (m hours) 0 7, 0 0, ?. 7 , 
0.8, 2.0, — 1,6, 3.4, —0.2, — 1.2, —0.1. With a calculating machine, compute s by the short- 
cut method in table 2.10.1. Ans. 5-1.79 hrs. The range method gave 1 .72 hrs 

EXAMPLE 2.10.3 — Without finding deviations fiom X and without using a calculating 
machine, compute I\ 2 for the following measurements: %1, 953, 970, 958. 950, 951 9 s' 1 
Ans. 286.9. 

2.11 — The standard deviation of sample means. With measurement 
data, as mentioned previously, the purpose of an investigation is often to 
estimate an average or total over a population (average selling price of 
houses in a town, total wheat crop in a region). If the data are a random 
sample from a population, the sample mean A' is used to estimate the cor- 
responding average over the population. Further, if the number of items 
N in the population is known, the quantity NX is an estimator of the popu- 
lation total of the X's. This brings up the question: How accurate is a 
sample mean as an estimator of the population mean? 

As usual, a question of this type can be examined either experimental- 
ly o> mathematically. With the experimental approach, we first find or 
construct a population that seems typical of the type of population en- 
counteied in our work. Suppose that we are particularly interested in 
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samples of size 100. We draw a large number of random samples of size 
100, computing the sample mean X for each sample. In this way we form 
a frequency distribution of the sample means, or graph the frequencies 
in a histogram. Since the mean of the population is known, we can find 
out how often the sample mean is satisfactorily close to the population 
mean, and how often it gives a poor estimate. 

Much mathematical work has been done on this problem and it 
has produced two of the most exciting and useful results in the whole of 
statistical theory. These results, which are part of every statistician’s 
stock in trade, will be stated first. Some experimental verification will 
then be presented for ^illustration. The first result gives the mean and 
standard deviation of X in repeated sampling ; the second gives the shape 
of the frequency distribution of X. 

Mean and standard deviation of X. If repeated random samples of size n 
are drawn from any population (not necessarily normal) that has mean p 
and standard deviation cr, the frequency distribution of the sample means 
X in these repeated samples has mean p and standard deviation (r/y/n. 

This result says that under random sampling the sample mean X is 
an unbiased estimator of p : on the average, in repeated sampling, it will be 
neither too high nor too low. Further, the sample means have less varia- 
tion about p than the original observations. The larger the sample size, 
the smaller this variation becomes. 

Students sometimes find it difficult to reach the point at which the 
phrase "The standard deviation of X” has a concrete meaning for them. 
Having been introduced to the idea of a standard deviation, it is not too 
hard to feel at home with a phrase like “the standard deviation of a man’s 
height,’’ because every da\ we ;ee tall men and short men, and realize 
that this standard deviation is a measure of the extent to which heights 
vary from one man to another. But usually when we have a sample, we 
calculate a single mean. Where does the variation come from? It is the 
variation that would arise if we drew repeated samples from the popula- 
tion that we are studying and computed the mean of each sample. The 
experimental samplings presented in this chapter and in chapter 3 may 
make this concept more realistic. 

The standard.deviation of X , <r/ N /w, is often called, alternatively, the 
standard error of X. The terms “standard deviation” and “standard 
error” are synonymous. When we are studying the frequency distribution 
of an estimator like X, itsstandard deviation supplies information about 
the amount of error in X when used to estimate p. Hence, the term 
"standard error” is rather natural. Normally, we would not speak of the 
standard error of a man’s height, because if a man is unusually tall, this 
does not imply that he has made a mistake in his height. 

The quantity NX, often used to estimate a total over the population, 
»s also an unbiased estimator under random sampling. Since N is simply 
fixed number, the mean of NX in repeated sampling is Wp , which, b\ 

definition of p. is the correct population total. The standaid error of 
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NX is No/y/n. Another frequently used result is that the sample total, 
EX = nX , has a standard deviation na/y/n, or Cy/n. 

2.12 — The frequency distribution of sample means. The second major 
result from statistical theory is that, whatever the shape of the frequency 
distribution of the original population of X’s, the frequency distribution 
of X in repeated random samples of size n tends to become normal as n 
increases. To put the result more specifically, recall that if we wish to 
express a variable X in standard measure , so that its mean is zero and its 
standard deviation is 1, we change the variable from X to (X - fi)/a. 
For X, the corresponding. expression in standard measure (sm) is 

Y 

A sm , / 

Gj^jn 

As n increases, the probability that X sm hes between any two limits L x 
and L 2 becomes more and more equal to the probability thatthe standard 
normal deviate Z lies between L x and L 2 . By expressing X in standard 
measure, table A 3 (the cumulativejaormal distribution) can be used to 
approximate the probability that X itself lies between any two limits. 
This result, known as the Central Limit Theorem (5), explains why the 
normal distribution and results derived from it are so commonly used 
with sample means, even when the original population is not normal. 
Apart from the condition of random sampling, the theorem requires 
very few assumptions: it is sufficient that a is finite and that the sample 
is a random sample from the population. 

To the practical worker, a key question is: how large must n be in 
order to use the normal distribution for X? Unfortunately, no simple 
general answer is available. With variates like the heights of men, the 
original distribution is near enough normal so that normality may be as- 
sumed for most purposes. In this case a sample with n = 1 is large enough. 
There are also populations, at first sight quite different from the normal, 
in which n = 4 or 5 will do. At the other extreme, somejpopulations re- 
quire sample sizes well over 100 before the distribution of X becomes at ail 
near to the normal distribution. 

As illustrations of the Central Limit Theorem, the results of two 
sampling experiments will be presented. In the first, the population is the 
population of random digits 0, 1, 2, ... 9 which we met in chapter 1. 
This is a discrete population. The variable X has ten possible values 
^ 1,2,. ..9, and has an equal probability 0.1 of taking any of these 
values. The frequency distribution of X is represented in the upper part of 
figure 2.12.1. Clearly, the distribution does not look like a normal dis- 
tribution. Distributions of this type are sometimes called uniform , since 
every value is equally likely. 

Four hundred random samples of size 5 were drawn from the table 
of random digits (p. 543), each sample being a group of five consecutive 
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Value of X 

H<» 2 12 1 - Lpper part Theoretical probability distribution of the random digits from 
0 to 9 Lower part Histogiam showing the distribution of 400 means of samples of size 
^ drawn tiom the random digits The cur\e is the normal distribution wuth mean ju — 4 5 
and standard deuation a x n = 2 872 x 5 = 1 284 
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numbers in a column. The frequency distribution of the sample means 
appears in the lower half of figure 2.12.3. A normal distribution with 
mean pi and standard deviation <r/y/5 is also shown. The agreement is 
surprisingly good, considering that the samples are only pf size 5. 

Calculation of /i and a. In fitting this normal distribution, the quantities 
ft and a were the mean and standard deviation of the original population 
of random digits. Although the calculation of X and s for a sample has 
been discussed, we have not explained how to calculate pi and a for a 
population. In a discrete population, denote the distinct values of the 
measurement X by X u X 2 , . . . X k . In the population of random digits, 
k = 10, and each value has an equal probability, one-tenth. In a more gen- 
eral discrete population, the value X x may appear with probability or 
relative frequency P v We could, for example, have a population of 
digits in which a 0 is 20 times as frequent as a 1. Since the probabilities 
must add to 1, we have 

I p, = i 

i - 1 

The expression on the left is read “the sum of the P t from i equals 1 to kC 
The population mean pi is defined as 

fi = i pj, 

i - 1 

Like X in a sample, the quantity pi is the average or mean of the values of 
X t in the population, noting, however, that each X t is weighted by its rela- 
tive frequency of occurrence. 

For the random digits, every P t = 0.1. Thus 

pi - (0.1)(0 +1+2 + 34-4 + 5 + 6 + 7 + 8 + 9)=- (0.1X45) = 4.5 

The population o comes from the deviations - pi. With the 

random digits, the first deviation is 0 - 4.5 - -4 5, and the successive 

deviations are —3.5, -2.5, —1.5, -0.5, +0.5, +1.5, +2.5, +3.5, and 
+ 4.5. The population variance, <r 2 , is defined as 

ff 2 = i - H ) 2 

l 

Thus, <j 2 is the weighted a\erage of the squared deviations of the values 
in the population from the population mean Numerically, 

a 2 = (0 2)[(4 5) 2 + (3 5) 2 + (2.5) 2 - (1 5) 2 + (0.5) 2 j = 8.25 

This gives o- = v'8-25 = 2 872; so that a N 5 = 1 284 

There is a shortcut method of finding o l without computing any 
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deviations: it is similar to the corresponding shortcut formula for Ex 2 . 
The formula is : 

a 1 = E p X 2 ~ M 2 
1=1 

With the normal distribution, jx is, as above, the average of the values 
of X, and a 2 is the average of the squared deviations from the population 
mean. Since the normal population is continuous, having an infinite 
number of values, formulas from the integral calculus are necessary in 
writing down these definitions. 

As a student or classroom exercise, drawing samples of size 5 from * 
the random digit tables is recommended as an easy way of seeing the 
Central Limit Theorem at work. The total of each sample is quickly 
obtained mentally. To avoid divisionsjby 5, work with sample totals 
instead of means. The sample total, 5X , has mean (5)(4.5) = 22.5 and 
standard deviation (5)(1.284) — 6.420 in repeated sampling. In forming 
the frequency distribution, put the totals 20, 21, 22, 23 in the central class, 
each class containing four consecutive totals. Although rather broad, 
this grouping is adequate unless, say, 500 samples have been drawn. 

The second sampling experiment illustrates the case in which a large 
sample size must be drawn if X is to be nearly normal. This happens with 
populations that are markedly skew, particularly if there are a few values 
very far from the mean. The population chosen consisted of the sizes 
(number of inhabitants) of U.S. cities having over 50,000 inhabitants in 
1950 (6), excluding the four largest cities. All except one have sizes rang- 
ing between 50,000 and 1,000,000. The exception, the largest city in the 
population, contained 1 ,850,000 inhabitants. The frequency distribution 
is shown at the top of figure 2,12.2. Note how asymmetrical the distri- 
bution is, the smallest class having much the highest frequency. The city 
with 1,850,000 inhabitants is not shown on this histogram: it would ap- 
pear about 4 inches to the right of the largest class. 

A set of 500 random samples with n — 25 and another set with n = 100 
were drawn. The frequency distributions of the sample means appear 
in the middle and lower parts of figure 2.12.2. With n = 25, the distribu- 
tion has moved towards the normal shape but is still noticeably asymmetri- 
cal. There is some further improvement towards symmetry with n = 100, 
but a normal curve would still be a poor fit. Evidently, samples of 400“ 
500 would be necessary to use the normal approximation with any as- 
surance. Part of the trouble is caused by the 1,850,000 city: the means 
for n = 100 would be more nearly normal if this city had been excluded 
from the population. On the other hand, the situation would be worse if 
the four largest cities had been included. 

Combining the theorems in this and the previous section, we now 
have the very useful result that in samples of reasonable size, X is approxi- 
mately normally distributed about jx< with standard deviation or standard 
error cr/^ n. 
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Fig. 2 12.2 — Top part: Frequent y distribution of the populations of 228 U.S cities having 
populations over 50,000 m 1950. Middle part: Frequency distribution of the means of 500 
random samples of size 25. Bottom part: Frequency distribution of the means of 500 
random samples of size 100. 
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EXAMPLE 2.12.1 — A population of heights of men has a standard deviation <7 = 2.6 
inches. What is the standard error of the mean of a random sample of (i) 25 men, (ii) 100 
men? Ans. (i) 0.52 in. (ii) (L26 in. 

EXAMPLE 2.12.2 — In order to estimate the total weight of a batch of 196 bags that 
are to be shipped, each of a random sample of 36 bags is weighed, giving X = 40 lbs. As- 
suming <7 = 3 lbs., estimate the total weight of the 196 bags and give the standard error of 
your estimate. Ans. 7,840 lbs. ; standard error, 98 lbs. 

EXAMPLE 2.12.3 — In estimating the mean height of a large group of boys with 
<r = 1.5 in., how large a sample must be taken if the standard error of the mean height is to 
be 0.2 in.? Ans. 56 boys. 

EXAMPLE 2.12.4 — If perfect dice are thrown repeatedly, the probability is 1/6 that 
each of the faces 1 , 2, 3, 4, 5, 6 turns up. Compute ^ and a for this population. AnS. p = 3.5, 
<7=1.71. 

EXAMPLE 2.12.5— If boys and girls are equally likely, the probabilities that a family of 
size two contains 0, 1, 2 boys are, respectively, 1 / 4 , 1/2, and 1/4 Find p and <7 for this 
population. Ans. jjl = 1, o = 1/^/2 = 0.71. 

EXAMPLE 2.12.6— The following sampling experiment shows how the Central Limit 
Theorem performs with a population simulating what is called a u-shaped distribution. In 
the random digits table, score 0, 1, 2, 3 as 0; 4, 5 as 1 ; and 6, 7, 8, 9 as 2. In this population, 
the probabilities of score of 0, 1,2 and 0.4, 0.2, and 0.4, respectively. This is a discrete dis- 
tribution in which the central ordinate, 0.2, is lower than the two outside ordinates, 0.4. 
Draw a number of samples of size 5, using the random digits table. Record the total score 
for each sample. The distribution of total scores will be found fairly similar to the bell- 
shaped normal curve. The theoretical distribution of the total scores is as follows: 

Score 0 or 10 1 or 9 2 or 8 3 or 7 4 or 6 5 

Prob. .010 .026 .077 .115 .182 .179 

That is, the probability of a 0 and that of a 10 are both 0.010. 


2.13 — Confidence intervals for jx when a is known. Given a random 
sample of size n from a population, where n is large enough so that X can 
be assumed normally distributed, we are now in a position to make an 
interval estimate of pt. For simplicity, we assume in this section that 
c is known. This is. not commonly so in practice. In some situations, 
however, previous populations similar to the one now being investigated 
all have about the same standard deviation, which is known from these 
previous results. Further, the value of c can sometimes be found from 
theoretical considerations about the nature of the population. 

We first show how to find a 95°/ 0 confidence interval. In section 2 A 
it was pointed out that if a variate X is drawn from a normal distribution, 
the probability is about 0.95 that X lies between jx — 2c and // + 2c. 
More exactly, the limits corresponding to a probability 0.95 are fx — 1.96a 
and fi + 1 j96cr. Apply this result to X , remembering that in repeated 
sampling X has a standard deviation a/^Jn. Thus, unless an unlucky 5% 
chance has come off, X will lie between fx — 1 .96c/ ^Jn and f.i + 1.9 6c/yjn. 
Expressing this as a pair of inequalities, we write 

/x - l.96c!yjn < X < fx + 1.96 c/^hi 
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apart from a 5% chance. These inequalities can be rewritten so that they 
provide limits for ji when we know X. The left-hand inequality is equiva- 
lent to the statement that 


pi ^ X Hr 1.96 a/yjn 

In the same way, the right-hand inequality implies that 

li>X~ 1.9 6a/Jn 

Putting the two together, we reach the statement that unless an unlucky 
5% chance occurred in drawing the sample, 

X - 1.96 <j/yjn <: fi <X 4* ISfo/yJn 

This is the 95% confidence interval for pi. 

Similarly, the 99% confidence interval for pi is 

X — ^ p. < X + 2.58 a/yjn 

because the probability is 0.99 that a normal deviate Z lies between the 
limits -2.58 and 4*2.58. 

To find the confidence interval corresponding to any confidence prob- 
ability P, read from the cumulative normal table (table A 3) a value Z f , 
say, such that the area given in the table is P/2. Then the probability that 
a normal deviate lies between — Z P and +Z f will be P. The confidence 
interval is 


X - Zpo/^jn < /x ^ X + Zptf/Jn 

One-sided confidence limits. Sometimes we want to find only an upper 
limit or a lower limit for pi, but not both. A company making large 
batches of a chemical product might have, as part of its quality control 
program, a regulation that each batch be tested to ensure that it does not 
contain more than 25 parts per million of a certain impurity, apart from 
a 1 in 100 chance. The test consists of drawing out n amounts of the prod- 
uct from the batch, and determining the concentration of impurity in 
each amount. If the batch is to pass the test, the 99% upper confidence 
limit for /i must be not more than 25 parts per million. Similarly, certain 
rooi:, of tropical trees are a source of a potent insecticide whose concen- 
tration vanes considerably from root to root. The buyer of a large ship- 
ment of these roots wants a guarantee that the concentration of the active 
ingredient in the shipment exceeds some stated value. It may be agreed 
between buyer and seller that the shipment is acceptable if, say, the 95% 
lower confidence limit for the average concentration pi exceeds the desired 
minimum. 

To find a one-sided or one-tailed limit with confidence probability 
95%, we want a normal deviate Z such that the area beyond Z in one tail 
is 0.05. In table A 3, the area from 0 to Z will be 0.45, and the value of Z 
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is 1 .645. Apart from a 5% chance in drawing the sample, 

X < p H- 1.645 a/Jn 

This gives, as the lower 95% confidence limit for p, 

p>X~ 1.645 a/yjn 

The upper limit is X + 1 .645a/ Jn. For 99% limit the value of Z is 2.326. 
For a one-sided limit with confidence probability P (expressed as a pro- 
portion), read table A 3 to find the Z that corresponds to probability 
(P — 0.5). 

2.14 — Size of sample. The question: How large a sample must I 
take? is frequently asked by investigators. The question is not easy to 
answer. But if the purpose of the investigation is to estimate the mean 
of a population from the results of a sample, the methods m the preceding 
sections are helpful. 

First, tfie investigator must state how accurate he would like his 
sample estimate to be. Does he want it to be correct to within 1 unit, 5 
units, or 10 units, on the scale on which he is measuring? In trying to 
answer this question, he thinks of the purposes to which the estimate will 
be put, and tries to envisage the consequences of having errors of different 
amounts in the estimate. If the estimate is to be made in order to guide 
a specific business or financial decision, calculations may indicate the 
level of accuracy necessary to make the estimate useful. In scientific re- 
search it is often harder to do this, and there may be an element of arbi- 
trariness in the answer finally given. 

By one means or another, the investigator states that he would like 
his estimate to be correct to within some limit ±L, say. Since the normal 
curveextends from minus infinity to plus infinity, we cannot guarantee 
that X is certain to lie between the limits p — L and p + L. We can, how- 
ever, make the probability that X lies between these limits as large as we 
please. In practice, this probability is usually set at 95% or 99%. For 
the 95% probability, we know that there is a 95% chance that X lies be- 
tween the limits p— 1 .96 a/Jn and p -b 1 Ma/^Jn. This gives the equation 

1.96 <r/Jn = L 

which is solved for n. 

The equation requires a knowledge of a , although the sample has 
not yet been drawn. From previous work on this or similar ^populations, 
the investigator guesses a value of a. Since this guess is likely to be some- 
i what in error, we might as well replace 1 .96 by 2 for simplicity. This gives 
I the formula 

n = 4 a z /L 2 

The formula for 99% probability is n- 6.6a 2 /L . 
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To summarize, the investigator must supply: (i) an upper limit L to 
the amount of error that he can tolerate in the estimate, (ii) the desired 
probability that the estimate will lie within this limit of error, and (iii) 
an advance guess at the population standard deviation <r. The formula 
for n is then very simple. 

EXAMPLE 2.14.1 — Find (i) the 80%, (ii) the 90% confidence limits for p, given X and 
a. Ans. (l) X ± \2%<r/*Jn, (ii) X ± L64a/Jn. 

EXAMPLE 2. 14.2 — The heights of a random sample of 16 men from a population with 
a - 2.6 m. are measured. What is the confidence probability that X does not differ fifom p 
by more than 1 m. 9 Ans. P — 0.876. 

EXAMPLE 2.14.3 — For the insecticide roots, the buyer wants assurance that the 
average content of the active ingredient is at least 8 lbs. per 100 lbs., apart from a l-m-100 
chance. A sample of 9 bundles of roots drawn from the batch gives, on analysis, X = 10.2 
lbs. active ingredient per 100 lbs. If a = 3 3 lbs. per 100 lbs., find the lower 99% confidence 
limit for p. Does the batch meet the specification? Ans. Lower limit 7.6 lbs. per 100 lbs. 
No. 


EXAMPLE 2.14.4 — In the auditing of a firm’s accounts receivable, 100 entries were 
checked out of a ledger containing 1 ,000 entries. For these 100 entries, the auditor's check 
showed that the stated total amount receivable exceeded the correct amount receivable by 
$214. Calculate an upper 95% confidence limit for the amount by which the reported total 
receivable in the whole ledger exceeds the correct amount. Assume a = $1.30 m the popu- 
lation of the bookkeeping errors. Ans. $2,354. Note: for an estimated population total , 
the formula for a one-sided upper limit for Np is NX + NZofJn. Note also that you are 
given the sample total nX — $214. 

EXAMPLE 2.14.5 — When measurements are rounded to the nearest whole number, it 
can often be assumed that the error due to rounding is equally likely to lie anywhere between 
-0.5 and +0.5. That is, rounding errors follow a uniform distribution between the limits 
-0.5 and +0 5. From theory, this distribution has p - 0, a * Iff 12 » 0 29. If 100 inde- 
pendent, rounded measurements are added, what is the probability that the error m the 
total due to rounding does not exceed 5? Ans. P - 0.916. 

EXAMPLE 2. 14.6— In the part of a large city m which houses are rented, an economist 
wishes to estimate the average monthly rent correct to within ±$20, apart from a l-tn-20 
chance If he guesses that a is about $60, how many houses must he include in his sample 9 
Ans n - 36 

EXAMPLE 2 14.7— Suppose that m the previous example the economist would like 
99% probability that his estimate is correct to within $20. Eurthei , he learns that m a recent 
sample ot 100 houses, the lowest rent was $30 and the highest wa« $260 Estimating o from 
these data, find the sample size needed Ans n~ 36 This estimate is. of course, very rough. 

EXAMPLE 2 14.8 — -Show that if we wish to cut the limit of eiror from L to L/2, the 
sample size must be quadrupled. With the same L, if we wish 99% probability of being 
within the limit rather than 95% probability, what percentage increase in sample si/e is 
required 9 Ans about 65° 0 increase 

2.15 — “Student’s” t-distribution. In most applications in which 
sample means are used to estimate population means, the value of a 
is not known. We can, however, obtain an estimate sofa from the sample 
data that give us the value of X. If the sample is of size n, the estimate s 
is based on (n — 1) degrees of freedom. We require a distribution that will 
enable us to compute confidence limits for p, knowing s but not a. Known 
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Fig. 2.15.1 — Distribution of t with 4 degrees of freedom. The shaded areas comprise 
5% of the tot|il area. The distribution is more peaked m the center and has higher tails 

than the normal. 

as “ Student's ” t-distribution , this result was discovered by W. S. Gosset 
in 1908 (7) and perfected by R. A. Fisher in 1926 (8). This distribution 
has revolutionized the statistics of small samples. In the next chapter you 
will be asked to verify the distribution by the same kind of sampling 
process you used for chi-square; indeed, it was by such sampling that 
Gosset first learned about it. 

The quantity t is given by the equation. 


That is, t is the deviation of the estimated mean from that of the popula- 
tion, measured in terms of s/yjn as the unit. We do not know pi though 
w.e may have some hypothesis about it. Without pi, t cannot be calcu- 
lated; but its sampling distribution has been worked out. 

The denominator, s/^n, is a useful quantity estimating a/Jn, the 
standard error of X. 

The distribution of t is laid out in table A 4, p. 549. In large samples 
it is practically normal with \i = 0 and a = 1. It is only for samples of less 
than 30 that the distinction becomes obvious. 

Like the normal, the /-distribution is symmetrical about the mean. 
This allows the probability in the table to be stated as that of a larger 
absolute value, sign ignored. For a sample of size 5, with 4 degrees of 
freeedom, figure 2.15.1 shows such values of t in the shaded areas; 2.5% 
of them are in one tail and 2.5% in the other. Effectively, the table shows 
the two halves of the figure superimposed, giving the sum of the shaded 
areas (probabilities) in both. 
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EXAMPLE 2.15.1 — In the vitamin C sampling of table 2.8.1,% « 3.9&A/17 « 0.965 
mg./ 100 gm. Set up the hypothesis that pi = 17.954 mg./lOO gm. Calculate/. Ans. 2.12. 

EXAMPLE 2.15.2 — For the vitamin C sample, degrees of freedom * 17 — 1 » 16, the 
denominator of the fraction giving s 2 . From table A 4, find the probability of a value of t 
larger in absolute value than 2.12. Ans. 0.05. This means that, among random samples of 
n = 17 from normal populations, 5% of them are expected to have /-values below — 2.12 or 
above 2.12. 

EXAMPLE 2. 1 5.3 — If samples of n** 17 are randomly drawn from a normal population 
and have / calculated for each, what is the probability that t will fall between —2.12 and 
+ 2.12? Ans. 0.95. 

EXAMPLE 2. 1 5,4— If random samples of n — 1 7 are drawn from a normal population, 
what is the probability of / greater than 2.12? Ans. 0.025. 

EXAMPLE 2 15.5 — What size of sample would have / > |2j m 5% of all random 
samples from normal populations ? Ans. 6 1 . (Note the symbol for “absolute value,” that is, 
ignoring signs.) 

EXAMPLE 2.15.6— Among very large samples {d.f. » oo) t what value of f would be 
exceeded in 2.5% of them? Ans. 1,96. 

2,16 — Confidence limits for pi based m the f-distrihotion. With a 
known, the 95% limits for pi were given by the relations 

X - 1.96 {j/yjn £ p. <X -F L96 cfyjn 

When a is replaced by 5, the only change needed is to replace the number 
1 .96 by a quantity which we call i 0X)S . To find t Q Q5 , read table A 4 in the 
column headed 0.050 and find the value of t for the number of degrees of 
freedom in s. When the d.f are infinite, t 0 05 = 1 .960. With 40 d.f, / 0 os 
has increased to 2.021, with 20 d.f it has become 2.086, and it continues 
to increase steadily as the number of df decline. 

The inequalities giving the 95% confidence limits then become 

X — to.os^i-Jn < pi < X + t 0 'Q 5 s/ x in 

As illustration, recall the vitamin C determinations in table 2.8.1 ; n = 17, 
X = 20 and s x ~ 0.965 mg./lOO gm. To get the 95% confidence interval 
(inteival estimate): 

I Eater the Dole with d.f. = 17 -1 = 16 and in the column headed 
0.05 take the eni ry , t 0 0 5 = 2 . 1 2. 

2. Calculate the quantity, 

*‘o O^v = (2.12X0.965) = 2.05 mg. 1 00 gm. 

3. The confidence interval is from 

20 - 2.05 = 17.95 to 20 -f 2.05 = 22.05 mg. 100 gm. 

If you say that pi lies inside the interval from 17,95 to 22.05 mg./lQO gm., 
you will be right unless a l-in-20 chance has occurred in the sampling. 

The point and 95%, interval estimate of pi may be summarized this 
way: 20 ± 2.05 mg./lOO gm. 
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The proof of this result is similar to that given when a is known. 
Although / 1 is unknown, the drawing cf a random sample creates a value of 


t = 


X - fL 

s/Jn 


that follows Student’s f -distribution with (n — 1) d.J Now the quantity 
t 0 05 in table A 4 was computed so that the probability is 0 95 that a value 
of t drawn at random lies between —l Q 05 and -w 0 05 Thus, there is a 
95% chance that 


*0 05 ^ 


X - n 

s/yjn 


^ + h 05 


Multiply throughout by s/^Jn, and then add \i to each term m the in- 
equalities This gives, with 95% probability, 

V - to o s s/yjn < X < jj. + t 0 o 5 s/jn 


The remainder of the proof is exactly the same as for a known. The limits 
may be expressed more compactly as X ± t 0 os $x For a one-sided 95% 
limit, use t 0 10 m place of t 0 05 . 


EXAMPLE 2 16 1 — The yields ot alfalfa from 10 plots were 0 8, 1 3 1 5, 1 7, 1 7, 1 8, 
2 0, 2 0, 2 0, and 2 2 tons per acre Set 95% limits on the mean of the population of which 
this is a random sample Ans 1 41 and 1 99 tons per acre 

EXAMPLE 2 1 6 2 — In an investigation of growth in school children m private schools, 
the sample mean height of 265 boys of age 13 1/2-14 1/2 >ears was 63 84 m with standard 
deviation s = 3 08 m What is the 95% confidence interval for /i 9 Ans 63 5 to 64 2 m 

EXAMPLE 2 16 3— In a check of a day’s work for each of a sample of 16 women 
engaged m tedious, repetitive work, the average number of minor errors per day was 5 6, 
with a sample s d of 3 6 Find ti) a 90% confidence interval for the population mean 
number of errors, (u) a one-sided upper 90% limit to the population number of errors 
Ans (i) 4 0 to 7 2 (n) 6 8 

EXAMPLE 2 16 4 — We have stated that the t distribution differs clearly from the 
normal distribution only for samples ot size less f \ an a0 Fc 1 a given value of s x , how much 
wider is (i ) the 95% (n) the 99° confidence inte * d when sFe ^ mple size is 30 than w hen the 
sample size is very large 1 Are there samp! ^ sizes 4 r which the 95% and 99% intervals 
become twice as wide, for the same s v as with very large samples 9 Ans (i) 4 3% wider 
(li) 7 0% wider, since s? has 29 dj For a sample of size 3(ldf) the 95% interval is twice 
as wide, and for a sample of size 4 the 99% interval is twice as wide With small samples, s 
is not a good estimate of er, and the confidence limits widen to allow for the chance that the 
sample s is far removed from the true a 

2.17 — Relative variation. Coefficient of variation. In describing the 
amount of variation in a population, a measure often used is the coefficient 
of variation C = ofi The sample estimate is s/X The standard devi- 
ation is expressed as a fraction, or sometimes as a percentage, of the mean 
The utility of this measure lies partly in the fact that in man> series the 
mean and standard deviation tend to change together This is illustrated 
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by the mean stature and corresponding standard deviation of girls from 
1 to 1 8 years of age shown graphically m figure 2. 1 7. 1 . Until the twelfth 
year the standard deviation increases at a somewhat greater rate, relative 
to its mean, than does stature, causing the coefficient of variation to rise, 
but by the seventeenth year and thereafter C is back to where it started. 
Without serious discrepancy one may fix m mind the figure, C = 3.75%, 
as the relative standard deviation of adult human stature, male as well as 
female. More precisely, the coefficient rises rather steadily from infancy 
through puberty, falls sharply during a brief period of uniformity, then 
takes on its permanent value near 3.75%. 

A knowledge of relative variation is valuable in evaluating experi- 
ments. After the statistics of an experiment are summarized, one may 
judge of its success partly by looking at C. In com vanety trials, for exam- 
ple, although mean yield and standard deviation vary with location and 
season, yet the coefficient of variation is often between 5% and 15%. 
Values outside this interval cause the investigator to wonder if an error 
has been made in calculation, or if some unusual circumstances throw 
doubt on the validity of the experiment. Similarly, each sampler knows 
what values of C may be expected in his own data, and is suspicious of 
any great deviation. If another worker with the same type of measure- 
ment reports C values much smaller than one’s own, it is worthwhile to 
try to discover why, since the reason may suggest ways of improving one’s 
precision. 



Fig 2 17 1 — Graph of 3 time senes, stature, standard deviation, and coefficient of varia- 
tion of girls from 1 to 18 years of age See reference (1) 
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Other uses of the coefficient of variation are numerous but less pre\ 
alent. Since C is the ratio of two averages having the same unit of mea- 
surement it is itself independent of the unit employed. Thus, C is the same 
whether inches, feet, or centimeters are used to measure height. Also, 
the coefficient of variation of the yield of hay is comparable to that of the 
yield of corn. Experimental animals have characteristic coefficients of 
variation, and these may be compared despite the diversity of the variables 
measured. Such information is often useful in guessing a value of a for 
the estimation of sample size as in section 2.14. 

Like many other ratios, the coefficient of variation is so convenient 
that some people overlook the information contained in the original data. 
Try to imagine how limited you would be in interpreting the stature-of- 
girls coefficients if they were not accompanied by X and s. You would 
not know whether an increase in C is due to a rising s or a falling X, nor 
whether the saw-tooth appearance of the C-curve results from irregulari- 
ties in one or both of the others, unless indeed you could supply the facts 
from your own fund of knowledge. The coefficient is informative and use- 
ful in the presence of X and s, but abstracted from them it may be mis- 
leading. 

EXAMPLE 2 17 \ — In experiments involving chlorophyll determinations in pineapple 
plants (10), the question was raised as to the method that would give the most consistent 
results Three bases of measurement were tried, each involving 12-leaf samples, with the 
statistics reported below From the coefficients of variation, it was decided that the methods 
were equally reliable, and the most convenient one could be chosen with no sacrifice of pre- 
cision 


Statistics of Chlorophyll Determinations of 12-Leaf Samples From Pineapple 
Plants, Using Three Bases of Measurement 



1 00-gram 

100-gram 

100-sq. cm. 

Statistic 

Wet Basis 

Dry Basis 

Basis 

Sample Mean (milligrams) 

61 4' 

337 

13.71 

Sample Standard Deviation (milligrams) 

5 22 

31.2 

1.20 

Coefficient of Variation (per cent) 

85 

93 

88 


EXAMPLE 2 17 2 — In a certain laboratory there is a colony of rats m which the coeffi- 
cient of variation of the weights of males between 56 and 84 days of age is close to 13%. 
Estimate the sample standard deviation of the weights of a lot of these rats whose sample 
mean weight is 200 grams A ns 26 grams 

EXAMPLE 2.17 3 — If C is the coefficient of variation m a population, show that the 
coefficient oi sanation of the mean of a random sample of size /ns C/yjn m repeated sampling 
Does the same result hold for the sample total 9 Ans Yes 

EXAMPLE 2 17 4 — If the coefficient of variation of the gam m weight of a certain 
animal over a month is 10%, what would you expect the coefficient of variation of the gair 
ewer a tour-month period to be? Ans The answer is complicated, and cannot be given 
tally at this stage If <r and p were the same during each of the four months, and if the 
gains were independent from month to month, the answer would be C/^J 4 = C/2, by the 
result in the preceding example. But animals sometimes grow by spurts, so that the gams m 
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successive periods may not be independent, and our formula for the standard deviation of 
a sample does not apply in this case. The answer is likely to he between C and C/2. The 
point will be clarified when we study correlation. 
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★ CHAPTER THREE 


^Experimental sampling from 
a normal population 


3.1 — Introduction. In chapter 1 the facts about confidence intervals 
for a proportion were verified through experimental sampling. This 
same device illustrated the theoretical distribution of chi-square that forms 
the basis of the test of a null hypothesis about the population proportion. 
In chapter 2 the results of two experimental samplings were presented to 
show that the distribution of means of random samples tends to approxi- 
mate the normal distribution with standard deviation a/Jn, as predicted 
by the Central Limit Theorem. 

In this chapter we present further experimental samplings from a 
population simulating the normal, with instructions so that the reader 
can perform his own samplings. The purposes are as follows : 

(1) To provide additional verification of the result that the sample 
means are normally distributed with S.D. = o/yjn. 

(2) To investigate the sampling distribution of s 2 , regarded as an 
estimate of <r 2 , and of s', regarded as an estimate of a. Thus far we have 
not been much concerned with the question : How good an estimate of 
g 1 is s 2 ? The frequency distribution of s 2 in normal samples has, however, 
been worked out and tabulated. Apart from a multiplier, it is an extended 
form of the chi-square distribution which we met in chapter 1 . 

(3) To illustrate the sampling distribution of t with 9 degrees of 
freedom, by comparing the values of t found in the experimental sampling 
with the theoretical distribution. 

(4) To verify confidence interval statements based on the /-distribu- 
tion. 

The population that we have devised to simulate a normal population 
departs from it in two respects : it is limited in size and range instead of 
being infinite, and has a discontinuous variate instead of the continuous 
one implied in the theory. The effects of these departures will scarcely be 
noticed, because they are small in comparison with sampling variation. 

3.2 — A finite population simulating the normal. In table 3.2.1 are 
the weight gains of a hundred swine, slightly modified from experimental 
data so as to form a distribution which is approximately normal with 
66 
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TABLE 3.2.1 

Array of Gains in Weight (Pounds) of 100 Swine During a Period of 20 Days 
The gams approximate a normal distribution with 
pi~ 30 pounds and a - 10 pounds 


Item 

Number 

Gam 

Item 

Number 

Gain 

Item 

Number 

Gam 

Item 

Number 

Gain 

00 

3 

! 25 

24 

50 

30 

75 

37 

01 

7 

26 

24 

51 

30 

76 

37 

02 

11 

27 

24 

52 

30 

77 

38 

03 

12 

28 

25 

53 

30 

78 

38 

04 

13 

29 

25 

54 

30 

79 

39 

05 

14 

30 

25 

55 

31 

80 

39 

06 

15 

31 

26 

56 

31 

81 

39 

07 

16 

32 

26 

57 

31 

82 

40 

08 

17 

33 

26 

58 

31 

83 

40 

09 

17 

34 

26 

59 

32 

84 

41 

10 

18 

35 

27 

60 

32 

85 

41 

11 

18 

36 

27 

61 

33 

86 

41 

12 

18 

37 

27 

62 

33 

87 

42 

13 

19 

38 

28 

63 

33 

88 

42 

14 

19 

39 

28 

64 

33 

89 

42 

15 

19 

40 

28 

65 

33 

90 

43 

16 

20 

41 

29 

66 

34 

91 

43 

17 

20 

42 

29 

67 

34 

92 

44 

18 

21 

43 

29 

68 

34 

93 

45 

19 

21 

44 

29 

69 

35 

94 

46 

20 

21 

45 

30 

70 

35 

95 

47 

21 

22 

46 

30 

71 

35 

96 

48 

22 

22 

47 

30 

72 

36 

97 

49 

23 

23 

48 

30 

73 

36 

98 

53 

24 

23 

49 

30 

74 

36 

99 

57 


ju = 30 pounds and a = 10 pounds. The items are numbered from 00 to 
99 in order that they may be identified easily with corresponding numbers 
taken from the table of random digits. The salient features of this kind 
of distribution may be discerned in figure 3.2.1 . The gains, clustering at 
the midpoint of the array, thin out symmetrically, slowly at first, then more 
and more rapidly: two-thirds of the gams lie in the interval 30 ± 10 
pounds, that is, in an interval of two standard deviations centered on the 
mean. In a real population, indefinitely great in number of individuals, 
greater extremes doubtless would exist, but that need cause us little con- 
cern. 

The relation of the histogram to the array is clear. After the class 
bounds are decided upon, it is necessary merely to count the dots lying 
between the vertical lines, then make the height of the rectangle propor- 
tional to their number. The central value, or class mark , of each interval 
is indicated on the horizontal scale of gains. 

In table 3.2.2 is the frequency distribution which is graphically repre- 
sented in figure 3.2.1. Only the class marks are entered in the first row. 
The class intervals are from 2.5 to 7.5, etc. 
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GAIN m POUNDS 


Fig 3 2 1 — Upper part Graphical representation of array of 100 normally distribute 
gains m weight Lower part Histogram of same gams The altitude of a rectangle in tl 
histogram is proportional to the number of dots m the array which he between the vertic 
sides 



TABLE 3.2 2 

Frequency Distribution of Gains in Weight of too Swing 
(A finite population approximating the normal) 
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Class mark (pounds) 

5 

10 

15 

20 

25 

50 

35 40 45 SO 55 

Frequency 

2 

2 

6 

n 

15 

23 

16 13 6 2 2 


33 — Random samples from a normal distribution. An easy way to 
draw random samples from the table of png gains is to take numbers con- 
secutively from die table of random numbers, table A 1, then match them 
with the gams by means of the integers, 00 to 99, in table 3.2.1. To avoid 
duplicating the samples of others in class work, start at some randomly 
selected point in the table of random numbers instead of at the beginning, 
then proceed upward, downward, or crosswise. Suppose you have hit 
upon the digit, 8, in row 71, column 29. This, with the following digit, 3, 
specifies pig number 83 in table 3.2.1, a pig whose gain is 40 pounds. 
Hence, 40 pounds is the first number of the sample. Moving upward 
among the random numbers you read the integers 09, >75, 90, etc., and 
record the corresponding gams from the table, 17, 37, and 43 pounds. 
Continuing, you get as many gains and as many samples as you wish. 

Samples of 10 are suggested. For our present purposes all the sam- 
ples must be of the same size because the distributions of their statistics 


TABLE 3.3 1 

Four Samples of 10 Items Drawn at Random From the Pig Gains of Table 3.2.1, 
Each Followed by Statistics To Be Explained in Sections 3.4-3.S 


Item Number 
and Formulas 

Sample Number 

1 

2 

3 

4 

1 

33 

32 

39 

17 

2 

53 

31 

34 

22 

3 

34 

11 

33 

20 

4 

29 

30 

33 

19 

5 

39 

19 

33 

3 

6 

57 

24 

39 

21 

7 

12 

53 

36 

3 

8 

24 

44 

32 

25 

9 

39 

19 

32 

40 

10 

36 

30 

30 

21 

X 

35.6 

29.3 

341 

19.1 

s 2 

169 8 

151 6 

90 

1123 

A 

130 

12 3 

30 

10.6 

S Y 

411 

3 89 

0 95 

3.35 

t 

1 36 

—0 18 

4.32 

-3.25 

h 0$ S X 

93 

8.8 

22 

7.6 

X ± t 0 05 ST 

26 3—44.9 

20.5-38 1 

31.9-36.3 

11 5-26.7 
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change with n. It is well to record the items in columns, leaving a half 
dozen lines below each for subsequent computations. For your guidance, 
four samples are listed in table 3.3.1. The computations below them will 
be explained as we go along. Draw as many of the samples as you think 
you can process within the time at your command. If several are working 
together the results of each can be made available to all. Keep the records 
carefully because you will need them again and again. 

Each pig gain may be drawn as often as its number appears in the 
table of random digits — it is not withdrawn from circulation after being 
taken once. Thus, the sampling is always from the same population, and 
the probability of drawing any particular item is constant throughout 
the process. 

EXAMPLE 33.1 — Determine the range m each of your samples of n = 10. The 
mean of the ranges estimates a /0 325 (table 2 4 1); that is, 10/0325 == 30.8. How close is 
your estimate 9 

3.4. — The distribution of sample means. First add the items in each 
sample, then put down the sample mean, X (division is by 10). While 
every mean is an estimator of = 30 pounds, there is yet great variation 
among them. Make an array of the means of all your samples. If there 
are enough of them, group them into a frequency distribution like table 
3.4.1. 

Our laboratory means ranged from 19 to 39 pounds, perhaps to the 
novice a disconcerting variability. To assess the meaning of this, try 
to imagine doing an experiment resulting in one of these more divergent 
mean gains instead of the population value, 30 pounds. Having no infor- 
mation about the population except that furnished by the sample, you 
would be considerably misled. There is no way to avoid this hazard. 
One of the objects of the experimental samplings is to acquaint you with 
the risks involved in all conclusions based on small portions of the aggre- 
gate. The investigator seldom knows the parameters of the sampled 
population; he knows only the sample estimates. He learns to view his 
experimental data in the light of his experience of sampling error. His 
judgments must involve not only the facts of his sample but all the related 
information which he and others have accumulated. 

The more optimistic draw satisfaction from the large number of 
means near the center of the distribution. If this were not characteristic, 
sampling would not be so useful and popular. The improbability of 
getting poor estimates produces a sense of security in making inferences. 

Fitting the normal distribution. In constructing table 3.4.1, one-pound 
class intervals were used. Since all the means come out exactly to one 
decimal place, the class limits were taken as 19.5-20.4, 20.5-21.4, and 
so on. 

From theory, the distribution of sample means should be very close 
to normal, with mean jx = 30 pounds and standard deviation a* = 10/ v ; 10 
= 3.162 pounds. The theoretical frequencies appear in the right-hand 
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TABLE 3.4.1 

Frequency Distribution of 51 1 Means of Samples of 10 Drawn From 
the Pig Gains in Table 3.2.1 


Class Limits 
{Pounds) 

Observed 

Frequency 

Theoretical 

Frequency 


Less than 19.5 

I 

0.20 


19.5-20.4 

I 

0.46 


20.5-21.4 

0 

1.12 


21.5-22.4 

7 

2.56 


22.5-23.4 

5 

5.47 


23.5-24.4 

10 

10.48 


24.5-25.4 

19 

18.09 


25.5-26.4 

30 

28.46 


26.5-27.4 

41 

40.52 


27.5-28.4 

48 

52.12 


28.5-29.4 

66 

60.76 


29.5-30.4 

72 

64.18 


30.5-31.4 

56 

61.32 


31.5-32.4 

46 

53.25 


32.5-33.4 

45 

41.65 


33.5-34.4 

* 22 

29,59 


34.5-35.4 

24 

19.11 


35.5-36.4 

12 

11.09 


36.5-37.4 

5 

5.88 


37.5-38.4 

0 

2.76 


Over 38.5 

1 

1,94 

Total 

511 

511.01 


column of table 3.4.1. To indicate how these are computed, let us check 
the frequency 28.46 for the class whose limits are 25.5-26.4. First we 
must take note of the fact that our computed means are discrete, since 
they change by intervals of 0.1, whereas the normal distribution is con- 
tinuous. No computed mean in our samples can have a value of, say, 
25.469, although the normal distribution allows such values. This dis- 
crepancy is handled by regarding any discrete mean as a grouping of all 
continuous values to which it is nearest. Thus, the observed mean of 
25.5 represents all continuous values lying between 25.45 and 25.55. 
Similarly, the observed mean 26.4 represents the continuous values be- 
tween 26.35 and 26.45. Hence for the class whose discrete limits are 
25.5 and 26.4, we take the true class limits as 25.45 and 26.45. When 
fitting a continuous theoretical distribution to an observed frequency 
distribution, the true class limits must always be found in this way. 

In order tojise the normal table, we express the true limits in standard 
measure. For X = 25.45, /x = 30, oj — 3.162, we have 

Z t = (X - n)fa = (25.45 - 30)/3.162 = - 1.439 

For X = 26.45, we find Z 2 = — 1.123. From table A 3 (p. 548) we read 
the area of the normal curve between — 1 . 1 23 and — 1 .439. By symmetry. 
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this is also the area between 1 . 1 23 and 1 .439. Linear interpolation in the 
table is required. The area from 0 to 1.43 is 0.4236 and from 0 to 1.44 is 
0.4251 . Hence, by linear interpolation, the are^ from 0 to 1 .439 is 

(0.9)(0.4251) + (0.1)(0.4236) = 0.4250. 

Similarly, the area from 0 to 1.123 is 0.3693 so that the required area is 
0.0557. Finally, since there are 51 1 means in the frequency distribution, 
the theoretical frequency in this class is (51 1)(0.0557) = 28.46. 

To summarize, the steps in fitting a normal distribution are: (i) Find 
the true class limits, (ii) Express each limit in standard measure, getting 
a series of values Z 1 , Z 2 , Z 3 , .... (iii) From table A 3, read the areas 
from 0 to Z u 0 to Z 2 , 0 to Z 3 , , . . . (iv) The theoretical probabilities in 
the classes are the areas from — 00 to Z ly from Z x to Z 2 , from Z 2 to Z 3 , 
and so on, ending with the area from Z k to + 00 , where Z k is the lower 
limit of the highest class. The area from — 00 to Z x is 0.5 — (area from 
0 to ZJ, and the area from Z k to +00 is 0.5 - (area from 0 to Z k ). The 
intermediate areas are all found by subtraction as in the iiumerical illus- 
tration. The only exception is the area that straddles the mean, say from 
Z v to Z w+ x . Here, Z u will be negative and Z u+ x positive.. In this case we 
add the area from 0 to Z u and that from 0 to Z M+ v (v) Finally, multiply 
each area by the total observed frequency. 

If you have used the same class limits as in table 3.4.1 but have drawn 
a different number of samples, say 200, multiply the theoretical frequencies 
m table 3.4.1 by 200/511 to obtain your comparable theoretical fre- 
quencies. If you used two-pound classes, as is advisable with a smaller 
number of samples, add the theoretical frequencies in table 3.4.1 in ap- 
propriate pairs and multiply by the relative sample sizes. 

It is clear from table 3.4.1 that the observed frequencies are a good 
fit to the theoretical frequencies. 

3.5 — Sampling distributions of s 2 and s. For each sample, calculate 
s 2 by the shortcut formula, 

= {LX 2 - (LX) 2 / 10}/9 

Four values of s 2 are shown in table 3.3.1. Three of them overestimate 
a 2 = 100, while the fourth is notably small. Examine any of your samples 
with unusual s 2 to learn what peculiarities of the sample are responsible. 
The freakish sample 3 in the table has a range of only 39-30 = 9 pounds, 
with not a single member less than fi. This sample gave the smallest s 1 
that appeared in our set of 51 1 values. 

The distribution of s 2 in our 511 samples is displayed in table 3.5.1. 
Notice its skewness, with bunching below the mean and a long tail above — 
resembling the chi-square distribution of chapter 1, though less extreme. 
Despite this, the mean of the values of s 2 is 101.5, closely approximating 
the population vanance, 100, and verifying the fact that s 2 is an unbiased 
estimator of a 2 . 
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TABLE 3.5.1 

Observed and 'Theoretical Distributions op 511 Mean Squares, s 2 , of Norma! 
Sample With n ** 10 


Class Mark 

20 

40 

60 

80 100 120 140 160 

180 200 220 240 260 2S 30qT20 

Frequency 

Obs 

12 

47 

92 

93 72 73 42 29 

26 II 8 2 1 

0 1 1 1 

Theor 

12 8 

508 

84 8 

94 7 84.5 65 2 45 9 29 6 

18 4 10 8 6 1 3 4 

3 8* 


* Combined frequency m 5 classes 


Our distribution of s, shown in table 3.5.2, has a slight skewness 
(not as large as that of s 2 ) as well as a small bias, with mean 9.8 pounds, 
slightly less than <r=10 pounds. Even in samples as small as 1 0 the Was U 
unimportant in a single estimate s . 


TABLE 3.5.2 

Frequency Distribution of 51 1 Sample Standard Deviations Corresponding to 
* the Mean Squares of Table 3.5.1 - 


Class mark 

3 4 5 6 7 a 

9 


11 

m 13 

14 

15 

16 17 18 

Frequency 

l 2 9 18 58 V 

80 

71 

79 

44 41 

V 

11 i 

8 

3 1 ,i " 2 

s i 


The theoretical distribution of s z . We have already mentioned that the 
distribution of s 2 in normal samples is closely related to the chi-square 
distribution. First, we give a general definition of the chi-square distribu- 
tion. If Z ly Z 2 , . . . Z f are independently drawn random normal deviates, 
the quantity : j 


yi = 2 X 2 -f z 2 2 4- . . -f Z/ 2 

follows the chi-square distribution with / degrees of freedom. Thus, chi- 
square with / degrees of freedom is defined as the distribution followed 
by the sum of squares of /independent normal deviates. The form of this 
distribution, was worked out mathematically. It could* alternatively, be 
examined by experimental sampling. By expressing the 100 gains in 
table 3.2.1 in standard measure, we would have a set of normal deviates 
from which we could draw samples of size /, computing y 2 as defined 
above for each sample. For more, accurate work, there are tables of 
random normal deviates (1)(2), that provide a basis for such samplings. 
Table A 5 (p. 550) presents the percentage points of the ^distribution. It 
will be much used at various points in this book. 

A second result from theory is that if s 2 is a mean square with/de- 
grees of freedom, computed from a normal population that has variance 
cr 2 , then the quantity fs 2 / <j 2 follows the chi-square distribution with /de- 
grees of freedom. This is an exact mathematical result. Since our sample 
vanances have (n - 1) d.f., the relation is 


/ 2 = (n — l)s 2 j<? 2 
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We cannot present a proof of this result, but a little algebra makes the 
relation between s 2 and y 1 clearer^ Remember that (n — l)s 2 is the sum 
of squares of deviations, £(X — X) 2 . Introduce /i as a working mean. 
From the identity for working means (section 2.10) we have 

(n - 1 ) 5 2 _ (X 1 - ix) 2 , (X 2 -ii) 2 , , (X„ - ix) 2 n(X - ix) 2 

— - + - 2 + ... + -z - 2 

Now, the quantities (Xj — n)/a, (X 2 — h)/<t, . . . (X„ — n)/<r, are all in stan- 
dard measure : in_other words, they are random normal deviates. And 
the quantity -Jn(X — ix)/a is another normal deviate, since the standard 
deviation of X is c/Jn. Hence we may write 

(n ~J ] ~ = z , 2 + z 2 + . . . + z „ 2 - z n+1 2 

Thus, ( n — \)s 2 /a 2 is the sum of squares of n normal deviates, minus the 
square of one normal deviate, whereas x 2 , with (n — 1 ) d.f., is the sum of 
the squares of (n — 1) normal deviates. It is not difficult to show mathe- 
matically that these two distributions are the same in this case. 

The theoretical frequencies for our 511 values of s 2 appear in the 
last line of table 3.5.1. Again, the agreement with the observed frequen- 
cies is good. For fitting this distribution, table A 5 is not very convenient. 
We used the table in reference (3), which gives, for specified values of x 2 , 
the probability of exceeding the specified value. 

From the definition of the chi-square distribution, we see that chi- 
square with 1 degree of freedom is the distribution followed by the square 
of a single normal deviate. Later (chapter 8) we shall show that the chi- 
square test criterion which we encountered in chapter 1 when testing a 
proportion is approximately distributed as the square of a normal 
deviate. 

Like the normal distribution, the theoretical distribution of chi- 
square is continuous. Unlike the normal, / 2 , being a sum of squares, 
cannot take negative values, so that the distribution extends from 0 to oo, 
whereas the normal, of course, extends from — oo to + oo. An important 
result from theory is that the mean value of x 2 with / degrees of freedom is 
exactly /. Since = x 2 cr 2 //, a consequence of this result is that the mean 
value of s 2 , in its theoretical distribution, is exactly cr 2 . This verifies the 
result mentioned in chapter 2 when we stated that s 2 is an unbiased 
estimator of <r 2 . The property that s 2 is unbiased does not require 
normality, but only that the sample be a random sample. 

3.6 — Interval estimates of a 2 . With continuous populations, our at- 
tention thus far has centered on the problem of estimating the population 
mean from a sample. In studying the precision of measuring instruments 
and in studying variability in populations, we face the problem of estimat- 
ing the population variance <r 2 from a sample. If the population is 
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normal, the x 2 table can be used to compute a confidence interval for a 2 
from a sample value s 1 . 

The entries in the chi-square table (p. 550) are the values of x 2 that 
are exceeded with the probabilities stated at the heads of the columns. 
For a 95% confidence interval, the relevant quantities are ^ 0 975 , the 
value of the chi-square exceeded with probability 0.975, and x 2 o 02 s> the 
value of chi-square exceeded with probability 0.025. Hence, the prob- 
ability that a value of x 2 drawn at random lies between these two limits is 
0.975 - 0.025 = 0.95. Since x 2 =fs 2 /a 2 , the probability is 95% that 
when our sample was drawn, 

2 fe 2 2 

X 0.975 ^ -^2 ^ X? O 025 

Multiplying through by a 2 , we have 

^Vo.975 <fs 2 Z <rY 0 .025 


The reader may verify that these inequalities are equivalent to the fol- 
lowing, 


fs 2 

V 2 

X 0 025 


< a 2 < 


fs 2 

V 2 

X 0.975 


This is the general formula for 95% confidence limits. With s 2 computed 
from a sample of size n, we have /= (n — 1), and fs 1 is the sum of squares 
of deviations, Ex 2 . The simplest form for computing i$, therefore, 


Ex 2 2 Ex 2 

2 < <? 2 < ~2 

X 0.025 X 0 975 

As an illustration we shall set confidence limits on a 2 for the popula- 
tion of vitamin C concentrations sampled in section 2.4. For these data. 
Ex 2 = 254, d.f. = 16, s 2 = 15.88. From table A 5, # 2 0 . 975 = 6.91 and 
X 2 o.o 25 ~ 28.8. Substituting, 

254 2 254 

T7T7T < a 2 < -TTTT’ 


that is, 


8.82 < a 2 < 36.76, 


gives the confidence interval for a 2 . Unless a 1 -in-20 chance has occurred 
m the sampling, a 2 lies between 8.82 and 36.76. To obtain confidence 
limits for or, take the square roots of these limits. The limits for a are 2.97 
and 6.06 mg./lOO gm. Note that s = 3.98 is not in the middle of the 
interval, since the distribution of 5 is skew. 
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* Large samples are necessary if a is to be estimated accurately. For 
illustration, assume that by an accurate estimate we mean one that is 
known, with confidence probability 95%, to be correct to within ± 10%. 
If our estimate $ is 100, the confidence limits for a should be 90 and 1 10. 
Consider a sample of size 101, giving 100 df in s 2 . From the last line of 
table A 5, with s 2 = 10,000, the 95% limits for a 2 are 7,720 and 13,470, 
so that those for a are 87,9 and 116. Thus, even a sample of 101 does not 
produce limits that are within 10% of the estimated value. For a sample 
of size 30, With s = 100, the limits are 80 and 1 34. The estimate could be 
in error by more than 20%. 

The frequency distribution of s 2 is sensitive to non-normality in the 
original population, and can be badly distributed by any gross errors that 
occur in the sample. This effect of non-normality is discussed further m 
section 3.15. 

3.7 — Test of a null hypothesis value of a 2 . Situations in which it is 
necessary to test whether a sample value of s 2 is consistent with a postu- 
lated population value of a 2 are not too frequent in practice. This prob- 
lem does arise, however, in some applications in which a 2 has been ob- 
tained from a very large sample and may be assumed known. In others, 
in genetics for example, a value of a 2 may be predicted from a theory that 
is to be tested. The following examples indicate how the test is made. 

Let the null hypothesis value of a 2 be a Q 2 . Usually, the tests wanted 
are one-tailed tests. When the alternative is tr 2 > n 0 2 , compute 


X 


2 


fs 2 __ Ex 2 
Co 2 cr 0 2 


This value is significant, at the 5% level, if it exceeds x 2 o.oso with / degrees 
of freedom. Suppose that an investigator has used for years a stock of 
inbred rats whose weights have a 0 = 26 grams. He considers switching 
to a cheaper source of supply of rats, except that he suspects that the new 
rats will show greater variability. An experiment on 20 new rats gave 
Ex 2 = 23,000, s = 35 grams, in line with his suspicions. As a check he 
tests the null hypothesis: a = 26 grams, against the alternative: a > 26 
grams. 


X 


2 


23,000 

W 


34.02, 


df = 19 


In table A 5, xVoso is 30. 14, so that the null hypothesis is rejected. 

, To test H a : cr 2 < <r 0 2 , reject at the 5% level if y 2 < y 2 0 950 . To 
| illustrate, a standard method of performing an intricate chemical analysis 
gives cr Q = 4.9 parts per 1,000 for the content of some chemical con- 
stituent. A refinement on the analysis, which may improve the precision 
and cannot make it worse, gave 5 = 4.1, based on 49 d.f. We have 
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X 2 = (49)(4.1) 2 /(4.9) 2 = 34.3. Table A 5 gives y 1 = 34.76 for /= 50 and 
26.51 for/= 40. Interpolating linearly, we find ^ 2 0 950 = 33.9 for / = 49. 
Formally, the null hypothesis would not be rejected, though the sig- 
nificance probability is very close to 5%. 

If H a is the two-sided alternative a 2 # <r 0 2 , the region of rejection is 
X 2 < X 2 o.97 s and y 2 > X 2 o.o 2 s- 

EXAMPLE 3.7. 1 — For the fitted normal distribution in table 3.4. 1 , verify the theoretical 
frequencies (i) 1.94 for the cl ass ’‘Over 38.5” and (ii) 64. 18 for the class “29.5-30.4/'' 

EXAMPLE 3.7,2 — If half the standard deviations in table 3,52 were expected to be 
less than a = 10 pounds, as would be true if s were symmetrically distributed about <x, cal- 
culate x 2 = 4.89, with 1 d.f. for the sample. The fact that % 2 is significant is evidence against 
a symmetrical distribution in the population. 

EXAMPLE 3.7.3 — In a sample of 61 patients, the amount of an anesthetic required to 
produce anesthesia suitable for surgery was found to have a standard deviation (from patient 
to patient) of s — 10.2 mg. Compute 90% confidence limits for <x. Ans. 8.9 and 12.0 mg. 
Use x 2 o, 9 so and X 2 0 050 * 

EXAMPLE 3.7.4— With routine equipment like light bulbs, which wear out after a 
time, the standard deviation of the length of life is an important factor m determining whether 
it is cheaper to replace all the pieces at fixed intervals or to replace each piece individually 
when it breaks down. For a certain gadget, an industrial statistician has calculated that it 
will pay to replace at fixed intervals if a < 6 days. A sample of 71 pieces gives s ** 4,2 days. 
Examine this question (i) by finding the upper 95% limit for a from (ii) by testing the null 
hypothesis cr = <r 0 == 6 days against the alternative a <6 days. Ans. (i) The upper 95% 
limit is 5.0 (ii) H 0 is rejected at the 5% level. Notice that the two procedures are equivalent ; 
if the upper confidence limit had been 6.0 days, the chi-square value would be at the 5% 
significance level. 

EXAMPLE 3 7.5 — For d.f. greater than 100, which are not shown in table A 5, an ap- 
proximation due to R. A. Fisher is that Jlx 1 is normally distributed with mean 1 
and standard deviation 1. Check this approximation by finding the value that it gives for 
X 2 & 025 when / = 100, the correct value being 129.56. Ans. 129 L * 

f 

The distribution of t. Returning to our experimental samples, 
we are ready to examine the t-distribution for 9 degree* of freedom. 
Since X and sy have already been calculated for each of your samples of 
10, the sample value of t may how be got by putting ft = 30, the formula 
being 


t = (X- 30 )/5* 

Here, t will be positive or negative according as X is greater or less than 
30 pounds. In the present sampling the two signs are equally likely, so 
you may expect about half of each. On account of this symmetry the 
mean of all your t should be near zero. 

The four samples in table 3.3.1 were selected to illustrate the manner 
in which large, small, and intermediate values of t arise in sampling. A 
small deviation, ( X - u), or a large sample standard error tend to make t 
small. Some striking combinations are put in the table, and you can 
doubtless find others among your samples. 
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TABLE 3.8.1 

Sample and Theoretical Distributions of t . Samples of 10 
Degrees of Freedom, 9 


Interval of / 

Sample 

Theoretical 

From 

To 

Frequency 

Percentage 

Frequency 

Percentage 

Frequency 

Cumulative 

One Tail 

Both Tails 


-3.250 

3 

0.6 

0.5 

100.0 


-3.250 

-2.821 

4 

0.8 

0.5 | 

99 5 


-2.821 

-2.262 

5 

1.0 

1-5 

99.0 


-2.262 

-1.833 

16 

3.1 

2.5 

97 5 


-1.833 

-1.383 1 

31 

6 1 

5.0 ! 

95 0 


-1.383 

- 1.100 i 

25 

4.9 

5.0 ! 

90.0 


-1.100 

-0.703 

52 

10.2 

10.0 . 

85.0 


-0.703 

0.0 1 

132 

25.8 

25.0 

75.0 


0.0 

0.703 

126 * 

24.6 

25.0 

50.0 

100.0 

0 703 

1.100 

41 

8.0 

10.0 

25.0 

50.0 

1.100 

1.383 

32 

6.3 

5.0 

15.0 

30.0 

1 383 

1.833 

18 

3.5 

5.0 

100 

20 0 

1 833 

2.262 

13 

2.5 

2.5 

50 

100 

2 262 

2 821 

8 

1.6 

1.5 

i 2 5 

5 0 

2 821 

3.250 

2 

0.4 

0.5 

1 1 0 

2.0 

3 250 


3 

06 

0.5 

| 0 5 

l 

1.0 


511 

100 0 

100 0 



The distribution ot the laboratory sample of t is displayed in table 
3.8. i . The class intervals in the present table are unequal, adjusted so as 
to bring into prominence certain useful probabilities in the tails of the 
distribution. The theoretical percentage frequencies are recorded for 
comparison with those of t he sample. The agreement is remarkably good. 
In the last two columns are the cumulative percentage frequencies which 
make the table convenient for confidence statements and tests of hy- 
potheses. Examination of the table reveals that 2.5% of all /-values in 
samples of 10 theoretically fall beyond 2.262, while another 2.5% of values 
are smaller than -2.262. Combining these two tails of the distribution, 
as shown m the last column, 5% of all r in samples of 10 lie further from 
the center than |2.262|, which is therefore the 5% level of t. Make a dis- 
tribution of your own sample t to be compared with the theoretical 
distribution^ in the table. 

Our stable, table A 4, is a two-tailed table because most applications 
of the /-distribution call for two-sided confidence limits and two-tailed 
tests of significance. If you need a table that gives the probability for 
specified values of t instead of t for specified probabilities, see (4). 


3.9 — The interv al estimate of // ; the confidence interval. The theory of 
the confidence interval may now be verified from your sampling. Each 
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sample specifies an interval, X ± t 0 Q_ s sj, said to cover /*. In each of your 
samples, substitute the estimators, X and sx, together with t 0 05 = 2.262, 
the 0.05 level for 9 d.f. Finally, if you say, for any particular sample, that 
the interval includes ft you will be either right or wrong ; which it is may be 
determined readily because you know that n — 30 pounds. The theory 
will be verified if about 95% of your statements are right and about 5% 
wrong. 

Table 3.3.1 (p. 69) gives the steps in computing confidence limits for 
four samples. The intervals given by these four samples are, respectively, 

26.3 to 44.9 

20.5 to 38.1 

31.9 to 36.3 

11.5 to 26.7 

Sample 1 warrants the statement that n lies between 26.3 and 44.9 
pounds, and we know that this interval does contain ft, as does likewise 
the interval from sample 2. On the contrary, samples 3 and 4 illustrate 
cases leading to false statements, one because of an unusually divergent 
sample mean, the other because of a small sample standard deviation. 
Sample 3 is particularly misleading: not only does it miss the mark, but 
the narrow confidence interval suggests that we have an unusually ac- 
curate estimate. Of the 51 1 laboratory samples, 486 resulted in correct 
statements about ft; that is, 95.1% of the statements were true. The per- 
centage of false statements, 4.9%, closely approximated the theoretical 
5%. Always bear in mind the condition involved in every confidence 
statement at the 5% level — it is right unless a 1 -in-20 chance has occurred 
in the sampling. 

Practical applications of this theory are by people doing experiments 
and other samplings without knowledge of the population parameters. 
When they make confidence statements, they do not know whether they 
are right or wrong — they know only the probability selected. 

EXAMPLE 3.9.1 — Using the sample frequencies of table 3.8.1, test the hypothesis 
(known to be true) that the r-distribution is symmetncal in the sense that half of the popu- 
lation frequency is greater than zero. Ans. x 2 = 1 .22. 


EXAMPLE 3.9.2 — From table 3.8. 1 it is seen that 3 + 4 + 5 + 8 + 2 + 3 = 25 samples 
have jr| > 2.262. Test the'hypothesis that 5% of the population values are greater than 
|2.262|. Ans. y 2 = 0.0124 

EXAMPLE 3 9.3 — In table 3 8 1, accumulate the sample frequencies m both tails and 
compare their percentage values with those m the last column of the table. 

EXAMPLE 3 9.4 — Dunng the fall of 1943, approximately one in each 1,000 city 
families of Iowa (cities are defined as having 2,500 inhabitants or more) was visited to learn 
the number of quarts of food canned. The average for 300 families was 165 quarts with 
standard deviation, 153 quarts. Calculate the 95% confidence limits. Ans. 165 ± 17 quarts. 
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EXAMPLE 3.9.5 — The 1940 census reported 312,000 dwelling units (roughly the same 
as families) m Iowa cities. From the statistics of the foregoing example, estimate the num- 
ber of qu&rts of food canned in Iowa cities m 1943 Ans. 51,500,000 quarts with 95% con- 
fidence limits, 46,200,000 and 56,800,000 quarts. 

3.10 — Use of frequency distributions for computing Jt and s. In this 
chapter we have used frequency distributions formed by grouping the 
sample data into classes to give a picture of the way in which a variable is 
distributed in a population. Ajfrequency distribution also provides a 
shortcut method of computing X and s from a large sample. For this 
calculation, at least 12 classes are advisable, and for highly accurate work, 
at least 20 classes. The reason will be indicated presently. 

After forming the classes and counting the frequency in each class, 
write down the class mark (center of the class) for each class. Normally, 
the class mark is found by noting the lower and the upper limits of the 
class, and taking the average of these two values. For instance, with data 
that are originally recorded to whole numbers, the class limits might be 
0-9, 10-19, and so on. The class marks are 4.5, 14.5, and so on. Note 
that the marks are not 5, 15, etc., as we might hastily conclude. 

- The assumptions made in the shortcut computation are that the 
class mark is very close to the actual mean of the items in the class, and 
that these items are approximately evenly distributed throughout the 
class. These assumptions are likely to hold well in the high-frequency 
classes near the middle of the distribution. Caution is necessary if there 
are natural groupings in the scale of measurement. An instance was ob- 
served where the number of seed compartments in tomatoes was the 
variable, its values being confined to whole numbers and halves. How- 
ever, halves occurred very infrequently. At first, the class intervals were 
chosen to extend from 2 up to but not including 3, etc., the class marks 
being written down as 2 1/2, 3 1/2, etc. Actually, the class means were 
almost at the lower boundaries, 2, 3, etc. This systematic error led to an 
overestimate of almost half a seed compartment in the mean. In this 
situation the actual class means should be computed and used as the class 
marks (see exercise 3.1 1.3). 

The same problem can arise in the extreme classes in a frequency 
distribution. To revert to the example with intervals 0-9, 10-19, etc. and 
class marks taken as 4.5, 14.5, etc., we might notice that the lowest class 
contained six 0’s, one 2, and one 6, so that the class mean is actually 1.0, 
whereas the class mark is 4 5. For accurate work the class mark for this 
class is taken as 1.0. 

In the shortcut computation of X and s, each item in the sample is 
replaced by the class mark for the class in which it lies. All values be- 
tween 10 and 19 in the previous example are replaced by 14.5. The process 
is exactly the same as that of rounding to the nearest whole number, or 
the nearest 100. This rounding introduces an additional error into the 
data. The argument for having a relatively large number of classes is to 
keep this error small. 



The remainder of this section discusses how much accuracy is lost 
owing to this rounding error. Let X represent any item in the sample and 
let X’ be the corresponding class mark or rounded value. Then we may 
write 


X' = X + e 

where e is the rounding error. If 7 is the width of the class interval, the 
values e are assumed to be roughly evenly distributed over the range from 
—7/2 to +1/2. An important result from theory is that the variance of 
the sum of two independent variables is the sum of their variances. This 
gives 

o x - 2 - Ox 2 + o 2 

If e is uniformly distributed between —7/2 and +1/2, it is known from 
theory that its variance is I 2 / 12. Hence, 

Ox- 2 — o x 2 + P/\2 — a 2 + P/12, 

since <r x 2 is the original population variance a 2 . 

Consequently, When a value X is replaced by the corresponding class 
mark X’, the variance is increased by7 2 /12 due to the rounding. The rela- 
tive increase in variance is P/\2c 2 . We would like this increase to be 
small. 

Suppose that there are 12 classes in the frequency distribution. If 
the distribution is not far from normal, nearly all the frequency lies within 
a distance ± 3 a frojn fi. Since these classes cover a range of 6a, 7 will 
be roughly 6<r/12 = a/2. Thus the relative increase in the variance of 
X due to grouping is about 1/48, or 2%. A further analysis, not presented 
here, shows that the computed s' 2 has a variance about 4% larger than 
that of the original s 2 (5). For ordinary work these small losses in ac- 
curacy to save time in computation are tolerable. For accurate work, the 
advice commonly given is that 7 should not exceed a/ 4. This requires 
about 24 classes to cover the frequency distribution when the sample is 
large. 

With a discrete variable, there is often no rounding and no loss of 
accuracy in using a frequency distribution to compute the sample mean 
and variance. For instance, in a study of accidents per week, the number 
of accidents might range only from 0 to 5. The six classes 0, 1 , 2, 3, 4, 5, 
give a complete representation of the sample data without any rounding. 

3.11 — Computation of X and s in large samples: example. The data 
in table 3,11.1 come from a sample of 533 weights of swine, arranged in 
22 classes. The steps in the calculation of X and s are given under the table. 

A further simplification comes from coding the class marks, as shown 
in the third column. Place the 0 on the coded scale at or near the class 
mark that has the highest frequency. We chose this origin at G = 170 
pounds. The classes above this class are coded, as 1, 2, 3, etc.; those 
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TABLE 3.11.1 

Frequency Distribution of Live Weights of 533 Swine. Computation of Mean 
and Standard Deviation. / = 10 Pounds, G = 170 Pounds 


Class Mark, 
Pounds 

Frequency 

f 

Code Numbers 
U 

Sum of 

Code Numbers 
fU 

Squares 

fv 2 

80 

1 

- 9 

- 9 

81 

90 

0 

- 8 

0 

0 

100 

0 

- 7 

0 

0 

110 

7 

- 6 

-42 

252 

120 

18 

- 5 

-90 

450 

130 

21 

- 4 

-84 

336 

140 

22 

- 3 

-66 

198 

150 

44 

— 2 

-88 

176 

160 

67 

- 1 

-67 

67 

170 

76 

0 

0 

0, 

180 

55 

1 

55 

55 

190 

57 

2 

114 

228 

200 

47 

3 

141 

423 

210 

33 

4 

132 

528 

220 

30 

5 

150 

750 

230 

23 

6 

138 

828 

240 

11 

7 

77 

539 

250 

5 

8 

40 

320 

260 

5 

9 

45 

405 

270 

4 

10 

40 

400 

280 

5 

11 

55 

605 

290 

2 

12 

24 

z88 


n = 533 


ZfU = 565 Y-fU 1 

= 6,929 


ZfU - 565 ZfU 2 = 6,929 

{'LfUy/n = (565) 2 /533 = 598.92 

/t7 = 10(565/533) 

= 10.6 pounds Sw 2 = 6,330.08 

1^G + 1V s v 2 = Zu 2 /(n - 1) - 11.8986 

= 170 + 10.6 3.45 

= 180.6 pounds s = Is v = (10)% = 34.5 pounds 


below as — 1 , - 2, - 3, etc. It is important to know the relation between 
your original and your coded class marks. If X (dropping the prime) is an 
original class mark and U is its coded value, this relation is 

X = G + IU 

where / is the width of the class interval (10 pounds in this example). To 
verify the rule, when U is —5, what is XI We have, X = 170 4- (10)( — 5) 
= 120, as appears in column 1. 

In the-computations we first find the sample mean and variance of 
U , namely U and %. From the above relation we get 

X = G + IU 




and 
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5 — % — I$ v 

With these relations the steps given under table 3.1 1.1 are easily fol- 
lowed. With a computing machine the individual values fU 2 need not be 
written down. Their sum can be found by taking the sum of products of 
the column U with the column fU. The individual values fU are required : 
pay attention to their signs when adding them. 

Note that s is 3.45 times the class interval /, so that the loss of ac- 
curacy due to the use of class marks is trivial. 

Sheppard's Correction . From the theory presented in the previous sec- 
tion, a consequence is that s 2 , as computed in table 3.11.1, is an estimate 
of <r 2 -f 7 2 /12, rather than of a 1 itself. A correction introduced by 
W. F. Sheppard (6) is to subtract / 2 /12 from the value of y 2 , in order to 
obtain a more nearly unbiased estimate of <r 2 . In this example, with 
y 2 = 1,189.86, the correction amounts to only 100/12, or 8.33. The cor- 
rected value of s is 3.44 as against our com\ uted 3.45. The correction is 
seldom substantial. The corrected value should not be used in a test of 
significance (7). 


EXAMPLE 3.1 1.1 — The data show the frequency distribution of the heights of 8,585 
men, arranged in ten 2-in. classes. The number of classes is too small for accurate work, but 
gives an easy exercise. Compute X and using a convenient coding. Ans. X = 67.53 in., 
s = 2.62 in. 


Class 

Mark (in.) 

Frequency 

Class 

Mark (in.) 

Frequency 

58 

6 

68 

2,559 

60 

55 

70 

1,709 

62 

252 

72 

594 

64 

1,063 

74 

111 

66 

2,213 

76 

23 

EXAMPLE 3.11.2 — Apply Sheppard’s correction and report the corrected s. Am, 


2.56 ins. 


EXAMPLE 3,11.3- This baby example illustrates how the accuracy of the shortcut 
method improves when the class marks are the means of the items in the classes. The original 
data consist of the fourteen values: 0, 0. 10, 12, 14, 16, 20, 22, 24, 25, 29, 32, 34, 49. (i) Com- 
pute X and s directly from these data, (ii ) Form a frequency distribution with classes 0-9, 
10-19, 20-29, 30-39, and 40-49. Compute X and s from the conventional class marks, 
4.5, 14.5, 24.5, 34.5, and 44.5. (iii) In the same frequency distribution, find the actual means 
of the items in each class, and use these means as theclass marks. (Coding doesn’t help here.) 
Ans. (i) X - 20.5, .v - 1 3.4. (ii) X -21.6, v = 1 1.4, both quite inaccurate, (iii) X - 20.5, 
.v - 1 3.2. Despite the rounding errors that contribute to this 5, it is smaller than the original 
x in (i). This is an effect of sampling error in this small sample. 



84 Chapter 3: Experimental Sampling Pmm a Normal Population 

EXAMPLE 3. I 1 4- The yields m grams of 1,499 rows of wheat are recorded by Wtebe 
(9). They have been tabulated as follows* 


Class Mark 

Frequency ! 

Class Mark 

Frequency 

Class Mark 

Frequency 

375 

3 

: 600 

127 

825 

10 

400 

13 

1 625 

140 

850 

10 

425 

41 

1 650 

122 

875 

4 

450 

99 

675 ‘ 

94 

900 

4 

475 

97 

700 

64 

925 

2 

500 

118 

725 

49 

950 

3 

525 

138 

| 750 

M 

975 

1 

550 

146 

775 

26 

1,000 

1 

575 

136 

1 800 

20 


— 


Total 1,499 


Compute X ~ 587.74 grams, and s = 100.55 grams Are these enough classes in this dis- 
tribution 9 


3.12— Tests of normality. Since many of the standard statistical tech- 
niques are based on the assumption of normality, methods forjudging the 
normality of a set of data are of interest. In this and m the following 
sections, three tests will be illustrated from the frequency distribution of 
means of samples of 100 drawn from the population of city sizes in section 
2.12 (p. 51). The histogram of this frequency distribution, shown in the 
bottom part of figure 2.12.2, p. 55, gave the impression that a normal 
distribution would not be a good fit. We can now verify this impression 
in a quantitative manner. 

In the first test, often called the x 2 goodness oj fit test , the data are 
grouped into classes to form a frequency distribution and the sample 
mean X and standard deviation s are calculated. From these quantities, a 
normal distribution is fitted and the expected frequencies in each class 
are obtained as described in section 3.4 (p. 70). Table 3.12.1 presents the^ 
observed frequencies f x and the expected frequencies F r 
For each class, compute and record the quantity 

(Ji ~~ I 1 ]) 2 F x = (Obs. - Exp.) 2 /Exp. 


The test criterion is 


x* = Z(A - Ff/F, 

summed over the classes. If the data actually come from a normal dis- 
ti ibution, this quantity follows approximately the theoretical x 2 distribu- 
tion with {k — 3} d /., where k is the number of classes used in computing 
X 2 If the data come from some other distribution, the observed f 
will tend to agree poorly with the values of F x that are expected on the 
assumption of normality, and the computed x 2 becomes large. Conse- 
quently, large values of / 2 cause rejection of the hypothesis of normality. 
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TABLE 3 12 3 

Tali ulation of the Goodnjss of Fit / ior i hi DteiftimmoN oi Mfans of 

Samh fcs 01 IOOCTtv Si/fs 


Frequencies 


Class Limits 

Obs 

Exp 


(l,000*s) 

f t 

F t 

U>-FiV 

Under 129 

9 

20 30 

6 29 

130-139 

35 

30 80 | 

0 57 

140-149 

1 68 

55 70 

2 72 

150 159 

94 

80 65 

2 21 

160 169 

90 

93 55 

o n 

170-179 

1 76 

87 00 

1 39 

180-189 

62 

64 80 

0 12 

190 199 

28 

38 70 

2 96 

200 209 

27 

18 55 

3 85 

210-219 

1 4 

7 10 

1 35 

220 229 

s 

** 20 1 


230-239 

1 

0 50} 

6 04 

240- 

I 

0 15) 


otal 

500 

500.00 

27 63 


X 2 =* 27 63, df « 11 

~3«8 P < 0 005 



The theorem that this quantity follows the theoretical distribution of 
X 2 when the null hypothesis holds and that the degrees of freedom are 
(k - 3) requires advanced methods of proof. The subtracted number 3 in 
the d.f. may be thought of as the number of ways in which the observed 
and expected frequencies have been forced to agree m the process of 
fitting the normal distribution. The numbers f and F l both add to 500 
and the sets agree in the values of X and a that they give. 

The theorem, also requires that the expected numbers not be too 
small. Small expectations are likely to occur only in the extreme classes. 
A working rule (10) is that the two extreme expectations may each be as 
low as 1, provided that most of the other expected values exceed 5. In 
table 3.12.1, small expectations occur in the three highest classes. In this 
event, classes are combined to give an expectation of at least one. The 
three highest classes give a combined /’ of 7 and F t of 2.85. The contribu- 
tion to x 2 is (4.15) 2 /2.85 = 6.04. 

For these data, k = 1 1 after combination, so that y 2 = 27.63 has 8 d.f. 
Reference to table A 5 shows that the hypothesis of normality is rejected 
at the 0.5° 0 level, the most extreme level given in this table. 

The y 2 test may be described as a non-specific test, in that the test 
criterion is directed against no particular type of departure from nor- 
mality. Examples occur in which the data are noticeably skew, although 
the y 2 test does not reject the null hypothesis. An alternative test that is 
designed to detect skewness is often used as a supplement to the y 2 test. 
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3,13 — A test of skewness, A measure of the amount of skewness in a 
population is given by the average value of (X — pi) 3 , taken over the 
population. This quantity is called the third moment about the mean . If 
low values of X are bunched close to the mean pi but high values extend 
far above the mean* this measure will be positive, since the large positive 
contributions (X - pi) 3 when X exceeds pi will predominate over the 
smaller negative contributions (X — pi) 3 obtained when X is less than pi. 
Populations with negative skewness, in which the lower tail is the ex- 
tended one, are also encountered. To render this measure independent 
of the scale on which the data are recorded, it is divided by <r 3 . The result- 
ing coefficient of skewness is denoted sometimes by and sometimes 
by y x . 

The sample estimate of this coefficient is denoted by yjb x or g x . We 
compute 

m 3 = Z(X - X) 3 ln 
m 2 = Z(X - X) 2 /n 

and take 

s f !>i = gi = m 3 /(m 2y /m 2 ) 

Note that m computing m 2 , the sample variance, we have divided by n 
instead of our customary (« - 1). This makes subsequent calculations 
slightly easier. 

The calculations are illustrated for the means of city sizes in table 
3.13.1. Coding is worthwhile. Since y Jb l is dimensionless, the whole 
calculation can be done in the coded scale, with no need to decode. Hav- 
ing chosen coded values (7, write down their squares and cubes (paying 
attention to signs). The U 4 values are not needed in this section. Form 
the sums of products with the/’s as indicated, and divide each sum by n 
to give the quantities h x , h 2 , h 3 . Carry two extra decimal places in the 
K s. The moments m 2 and m z are then obtained from the algebraic 
identities given under the table. Finally, we obtain fb x = 0.4707. 

If the sample comes from a normal population, yjb x is approximately 
normally distributed with mean zero and S.D. yj(6/n\ or m this case 
f (6/500) = 0.1 10. Since % b x is over 4 times its S.D.. the positive skewness 
is confirmed. The assumption that SJ b 1 is normally distributed is ac- 
curate enough for this test if n exceeds 150. For sample sizes between 25 
and 200, the one-tailed 5° 0 and 1% significance levels of yjb^ computed 
from a more accurate approximation, are given m table A 6. 

3.14 — Tests for kurtosis. A further type of departure from normality 
is called kurtosis. In a population, a measure of kurtosis is the average 
value of (A - pi) 4 , divided by <t 4 . For the normal distribution, this ratio 
has the value 3. If the ratio exceeds 3, there is usually an excess of values 
near the mean and far from it, with a corresponding depletion of the flanks 
of the distribution curve. This is the manner in which the /-distribution 
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TABLE 3 13.1 

Computations foe Tests of Skewness and Kur rosis 


Lower Class 
Limit 

/ 

V 

U 2 

l" 

U A 

120- 

9 

~4 

16 

— 64 

256 

130- 

35 

-3 

9 

-27 

81 

140— 

68 

-2 

4 

- 8 

16 

150- 

94 

-1 

I 

- I 

1 

160- 

90 

0 

0 

0 

0 

1 70— 

76 

1 

l 

1 

1 

1 80— 

62 

2 

4 

8 

16 

190— 

28 

3 

9 

27 

81 

200- 

27 

4 

16 

64 

256 

210- 

4 

5 

25 

125 

625 

220- 

5 

6 

36 

216 

1,296 

230- 

1 

7 

49 

343 

2,401 

240- 

1 

8 

64 

512 

4,096 


n ~ 500 Test oj skewness 

'LjU » + 86 h t * ZfU/n * + 0.172 

I/£/ 2 = 2,226 h 2 ~ZfU 2 /n~ 4.452 

I fU* « + 3,332 A 3 - Z/£/ 3 /« * + 6.664 

m 2 - h 2 — h t 2 - 4.4224 
m 3 - 3/1^ + 2V- 4.3770 

- m 3 /m 2N /m 2 - 4.3770/(4.4224)^4.4224 * 0.4707 

ofkurtosis 

X/77 4 = 32,046 A 4 - Z/T/ 4 /w « 64.092 

m A ~h 4 - 4 k t h 3 + 6 h x 2 h 2 - 3A t 4 ~ 60 2948 
b 2 = mjm 2 2 = 60.2948/(4 4224) 2 - 3.083 


departs from the normal. Ratios less than 3 result from curves that have 
a flatter top than the normal. 

A sample estimate of the amount of kurtosis is given by 

g 2 = b 2 - 3 = (mjm 2 2 ) - 3. 

where 


m 4 = 'LiX-Xf'n 

is the fourth moment of the sample about its mean. Notice that the normal 
distribution value 3 has been subtracted, with the result that peaked 
distributions show positive kurtosis and flat-topped distributions show 
negative kurtosis. 

The shortcut computation of m 4 and b 2 from the coded values U is 
shown under table 3.13.1. For this sample, g 2 = - 3 has the value 

+ 0.083. In very large samples from the normal distribution, g 2 is nor- 
mally distributed with mean 0 and S.D. yj(24/n) = 0.219, since n is 
500. The sample value of g 2 is much smaller than its standard error, so 
that the amount of kurtosis in the population appears to be trivial. 



88 Chapter 3: Experimental Sampling From a Normal Population 

Unfortunately, the distribution of g 2 does not approach the normal 
closely until the sample size is over 1,000. For sample sizes between 200 
and 1,000, table A 6 contains better approximations to the 5% and 1% 
significance levels. Since the distribution of g 2 is skew, the two tails are 
shown separately. For n = 500, the upper 5% value of g 2 is 4*0.37, much 
greater than the value 0.083 found in this sample. 

For sample sizes less than 200, no tables of the significance levels of 
g 2 are at present available. R. C. Geary (11) developed an alternative 
test criterion for kurtosis, 

a = (mean deviation)/(standard deviation) 

= E|J — J|/« v 'm 2) 

and tabulated its significance levels for sample sizes down to n = 11. If 
A" is a normal deviate, the value of a when computed for the whole popula- 
tion is 0.7979. Positive kurtosis produces higher values, and negative 
kurtosis lower values of a . When applied to the same data, a and g 2 
usually agree well in their verdicts. The advantages of a are that tables 
are available for smaller sample sizes and that a is easier to compute. 

An identity simplifies the calculation of the numerator of a. This will 
be illustrated for the coded scale in table 3.13.1. Let 

I' = sum of all observations that exceed U 
ri = number of observations that exceed V 
Z|6 r — C7| = 2(1' - n'U) 

Since U = 0.172, all observations in the classes with U = 1 or more 
exceed U. This gives Z' = 457, ri = 204. Hence, 

t[U - L7| = 2{457 - (204)(0.172)} = 843.82 

Since m 2 = 4,4224, we have 

a = (843.82)/(500)v'4.4224 = 0.802 

This is little greater than the value 0.7979 for the normal distribution, 
in agreement with the result given by g 2 . For n = 500 the upper 5% level 
of a is about 0.814. 

3.15 — Effects of skewness and kurtosis. In samples from non-normal 
populations, the quantities g l and g 2 are useful as estimates of the cor- 
responding population values and y 2 , which characterize the common 
types of non-normality. K. Pearson produced a family of theoretical non- 
normal curves intended to simulate the shapes of frequency distributions 
having any specified values of y x and y 2 , provided that the non-normality 
was not too extreme. 

The quantities y t and y 2 have also been useful in studying the dis- 
tributions of X and 5 2 when the original population is non-normal. Two 
results will be quoted. For the distribution of X in random samples of 
size n. 
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ViOO = yjyjn : y 2 (X) = y 2 /n 

Thus, in the distribution of X, the measures of skewness and kurtosis 
both go to zero when the sample size increases, as would be expected from 
the Central Limit Theorem. Since the kurtosis is damped much faster 
than the skewness, it is not surprising that in our sample means g 2 was 
substantial butg 2 small. 

Secondly, the exact variance of s 2 with / degrees of freedom is known 
to be 


F(s 2 ) - 


la* 

f 


1 + 


/+ 1 



The factor outside the brackets is the variance of s 2 in samples from a 
normal population. The term inside the brackets is the factor by which 
the normal variance is multiplied when the population is non-normal. 
For example, if the measure of kurtosis, y 2 , is 1, the variance of s 2 is 
about 1 .5 times as large as it is in a normal population. With y 2 = 2, the 
variance of s 2 is about twice as large as in a normal population. These 
results show that the distribution of s 2 is sensitive to amounts of kurtosis 
that may pass unnoticed in handling the data. 

EXAMPLE 3.15.1— In table 3.2.2, compute g, - - 0.0139 and g t = 0.0460, showing 
that the distribution is practically normal m these respects. 

EXAMPLE 3. 15.2 — In table 3.5.2 is the sampling distribution of 511 standard devia- 
tions. Calculate ~ 0.3074 with standard error 0 108. As expected, this indicates that 
the distribution is positively skew. 

EXAMPLE 3.15.3- The 51 i values ot t discussed in section 3.8 were distributed as fol- 


lows : 


Class Mark 

J \ 

Class Mark 

t 

Class Mark 

f 

Class Mark 

/ 

”■3,13 

. - - -| 

3 

-1.13 

29 

0.87 

31 1 

2.87 

1 

— 2.88 

5 : 

-0.88 

35 

1 1? 

23 

3.12 

1 

-2.63 

I ! 

-0.63 

38 

1.3" 

l 7 

3.37 

2 

-2.38 

i ! 

-0.38 

40 

I 62 

U 

3.62 

0 

-2.33 

6 

-o.n 

5? 

I 87 

8 

3.87 

0 

-1.88 

1? 

0 12 


2.12 

10 

4.12 

0 

-1.63 

21 

0 n 

43 

2.37 

6 

4.37 

1 

-1 38 

16 

0 62 

T 

2 0 2 

2 


- 


Total 5S1 


The highly significant value of g 2 — 0.5340 shows that the frequencies near the mode and 
in the tails are greater than m the noimal distribution, those in the flanks being less. This 
was expected But Cj - 0 1356 is non -significant, which is also expected because the theoreti- 
cal distribute! of t is symmetrical 
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★ CHAPTER FOUR 


T 

JL he comparison of two samples 


4.1 — Estimates and tests of differences. Investigations are often de- 
signed to discover and evaluate differences between effects rather than the 
effects themselves. It is the difference between the amounts learned under 
two methods of teaching, the difference between the lengths of life of two 
types of glassware or the difference between the degrees of relief reported 
from two pain-relieving drugs that is wanted. In this chapter we consider 
the simplest investigation of this type, in which two groups or two pro- 
cedures are compared. In experimentation, these procedures are often 
called the treatments . Such a study may be conducted in two ways. 

Paired samples . Pairs of similar individuals or things are selected. One 
treatment is applied to one member of each pair, the other treatment to 
the second member. The members of a pair may be two students of 
similar ability; two patients of the same age and sex who have just under- 
gone the same type of operation; or two male mice from the same litter. 
A common application occurs in self pairing in which a single individual 
is measured on two occasions. For example, the blood pressure of a sub- 
ject might be measured before and after heavy exercise. For any pair, the 
difference between the measurements given by the two members is an 
estimate of the difference in the effects of the two treatments or procedures. 

With only a single pair it is impossible to say whether the difference 
in behavior is to be attributed to the difference in treatment, to the natural 
variability of the individuals, or partly to both. There must be a number 
of pairs. The data to be analyzed consist of a sample of n differences in 
measurement. 

Independent samples . This case, which is commoner, arises whenever we 
wish to compare the means of two populations and have drawn a sample 
from each quite independently. We might have a sample of men aged 
50-55 and one of men aged 30-35, in order to compare the amounts 
spent on life insurance. Or we might have a sample of high school seniors 
from rural schools and one from urban schools, in order to compare 
their knowledge of current affairs as judged by a special examination on 
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this subject. Independent samples are widely used in experimentation 
when no suitable basis for pairing exists, as, for example, in comparing 
the lengths of life of two types of drinking glass under the ordinary condi- 
tions of restaurant use. 

4.2 — A simulated paired experiment Eight pairs of random normal 
deviates were drawn from a table of random normal deviates. The first 
member of each pair represents the result produced by a Standard pro- 
cedure, while the second member is the result produced by a New proce- 
dure that is being compared with the Standard. The eight differences, 
New-St., are shown in the Column headed Case 1 in table 4.2.1. 


TABLE 4.2.1 

A Simiiai fD Paired Expuumem 


P«UI 

CASE 1 

i New-St. (D t ) 

’Z 

i ?£ 

m Oj 

1 L/ 11 
fa ^ 

i 

CASE 111 
New-St. (D t ) 

1 

| 4-3 2 

+ 13.2 

+ 4.2 

2 

— 1.7 

+ 8.3 

-0.7 


+ 08 

+ 10.8 

+ 1.8 

1 

—0.3 

+ 9.7 

+ 07 

5 

+0 5 

+ 10 * 

+ 1 5 

6 

+ 12 

+ 11.2 

+ 2.2 

7 

-1.1 

+ 8.9 

-0 1 

8 

—0 4 

+ 9.6 

+ 0.6 

Mean 0) 

+0.28 

+ 10.28 

+ 1.28 

+ 

T 1 527 

1.527 

1.527 

' D 

0.540 

0.540 

0 540 


Since the results for the New and Standard procedures were drawn 
from the same normal population, Case I simulates a situation in which 
there is no difference in effect between the f wo procedures. The observed 
differences represent the natural varia fihtv that is always present in ex* 
penments It is obvious on inspection that the eight differences do not 
indicate any superiority of the New procedure. Four of the differences are 
+ and 4 are - and the near difference is smrIL 

The results in Case II were obtained from those in Case I by adding 
-f 10 to every figure, to represent a situation in which the New procedure 
is actually 10 units better than ihe Standard. On looking at the data, most 
investigators would reach the judgment that the superiority of the New 
procedure is definitely established, and would probably conclude that 
the average advantage m favor of it is not far from 10 units. 

Case III is more puzzling. We added + 1 to every figure in Case I, 
so that the New procedure gives a small gain over the Standard. The New 
procedure wins 6 times out of the 8 trials, and some workers might con- 
clude that the results confirm the wper./ ;ity of the New procedure. 
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Others might disagree. They might point out that is is not too unusual 
for a fair coin to show heads in 6 tosses out of 8, and that the individual 
results range from an advantage of 0.7 units for the Standard to an ad- 
vantage of 4.2 units for the New procedure. They would argue that the 
results are inconclusive. We shall see what verdicts are suggested by the 
statistical analyses in these three cases. 

The data also illustrate the assumptions made in the analysis of a 
paired trial. The differences D, in the individual pairs are assumed to be 
distributed about a mean y. D , which represents the average difference in 
the effects of the two treatments over the population of which these pairs 
are a random sample. The deviations £>, - n D may be due to various 
causes, in particular to inherent differences between the members of the 
pair and to any errors of measurement to which the measuring instruments 
are subject. Another source of this variation is that a treatment may 
actually have different effects on different members of the population. A 
lotion for the relief of muscular pains may be more successful with some 
types of pain than with others. The adage: “One man's meat is another 
man’s poison” expresses this variability in extreme form For many ap- 
plications it is important to study the extent to which the effect of a treat- 
ment varies from one member of the population to another. This re- 
quires a more elaborate analysis, and usually a more complex experiment, 
than we are discussing at present. In the simple paired trial we compare 
only the average effects of the two treatments or procedures over the 
population. 

In the analysis, the deviations D t - n D are assumed to be normally 
and independently distributed with population mean zero. The conse- 
quences of failures in these assumptions are discussed in chapter 1 1 

When these assumptions hold, the sample mean difference Z> is 
normally distributed about /x D with standard deviation or standard error 
a D /yJn, where a D is the S.D. of the population of differences. The value 
of a D is seldom known, but the sample furnishes an estimate 

IZ(D t - W (ZDftn 

s ° ~ yj n - 1 ~ \j n - 1 

Hence, % « s D jyjn is an estimate of a 0 < based on (n ~ i ) dj. 

The important consequence of thtse results is that the quantity 

< -^(D - n u )js b 

follows Student" '-distribution w ith (n - 1 ) <//„ where n w the number of 
pairs. The /-distnbuiton may be used to test the n ill hypothesis that 

/Id = 0, or to compute a confidence interval for u D 

Test of n^mtuance i he lest wn* bv applied first to the doubtful* Case 
III. Tin . ulues of s D oik 1 are shown at the foot o table 4 *\1. Note 
that these are exactly the same in ail three cases, since the addition of a 
constant to all the D x does not affect the deviations {D l ~ D). For 
Case Hi i,ve have 
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t = D/sj) — 1 . 28 / 0.540 = 2.370 

With 7 <£/, table A 4 shows that the 5% level of t in a two-tailed test is 
2.365. The observed mean difference just reaches the 5% level, so that 
the data point to a superiority of the new treatment. 

In Case II, t = 10.28/0.540 = 19.04. This value lies far beyond even 
the 0.1% level (5.405) in table A 4. We might report: “P < 0.001.” 

In Case I, t = 0.28/0.540 = 0.519. From table A 4, an absolute 
value of f = 0.71 1 is exceeded 50% of the time in sampling from a popula- 
tion with /i p = 0. The test provides no evidence on which to reject the 
null hypothesis in Case I. To sum up, the tests confirm the judgment of 
the preliminary inspection in all three cases. 

Confidence interval From the formula given in section 2.16, the 95% 
confidence interval for fi D is 

D ± t 0 05 5p = D± (2.365)(0.540) = 25 ± 1.28 
In the simulated example the limits are as follows. 

Case I : -1.00 to 1.56 

Case II : 9.00 to 11.56 

Case III : 0.00 to 2.56 

As always happens, the 95% confidence limits agree with the verdict given 
by the 5% tests of significance. Either technique may be used. 

4.3 — Example ©f a paired experiment. The preceding examples il- 
lustrate the assumptions and formulas used in the analysis of a paired set of 
data, but do not bring out the purpose of the pairing, Youden and Beale 
(1) wished to find out if two preparations of a virus would produce dif- 
ferent effects on tobacco plants. The method employed was to rub half a 
leaf of a tobacco plant with cheesecloth soaked in one preparation of the 
virus extract, then to rub the second half similarly with the second extract. 
The measurement of potency was the number of local lesions appearing 
on the half leaf: these lesions appear as small dark rings that are easily 
counted. The data m table 4.3.1 are taken from leaf number 2 on each 
of 8 plants. The steps in the analysis are e \actly the same as in the preced- 
ing. We have, however, presented the deviations of the differences from 
their mean, d x ~ D t ~ D, and obtained the sum of squares of deviations 
directly instead of by the shortcut formula. 

For a test of the null hypothesis that the two preparations produce on 
the average the same number of lesions, we compute 

D 4 

t = — = — — = 2.63, d.f. = n — 1 = 7 

s 3 1.52 J 

From table A 4, the significance probability is about 0.04, and the null 
hypothesis is rejected. We conclude that in the population the second 
preparation produces fewer lesions than the first. From this result we 
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TABLE 4.3.1 

Number of Lesions on Halves of Eight Tobacco Leaves* 


Pair No. 

Prepara- 
tion 1 

Prepara- 
tion 2 
*2 

Difference 
D~X x -X 2 

Deviation 

d=D-B 

Squared 

Deviation 

d 2 

1 

31 

18 

13 

9 

81 

2 

20 

17 

3 

-1 

1 

3 

18 

14 

4 

0 

0 

4 

17 

11 

6 

2 

4 

5 

9 

10 

-1 

— 5 

25 

6 

8 

7 

1 

-3 

9 

7 

10 

5 

5 

1 

1 

8 

7 

6 

1 

-3 

9 

Total 

120 

88 

32 

0 

130 

Mean 

15 

11 

5 = 4 


i D 2 = 18.57 


Sjj 1 = 18.57/8 = 2.32, s c = 1.52 lesions 


* Slightly changed to make calculation easier. 

would expect that both the 95% confidence limits for p D will be positive. 
Since t 00 5 s v = (2.365)(1.52) = 3.69, the 95% limits are +0.4 and +7.6 
lesions per leaf. 

In this experiment the leaf constitutes the pair. This choice was 
made as a result of earlier studies in which a single preparation was rubbed 
on a large number of leaves, the lesions found on each half-leaf being 
counted. In a new type of work, a preliminary study of this kind can be 
highly useful. Since every half-leaf was treated in the same way, the varia- 
tions found in the numbers of lesions per half leaf represent the natural 
variability of .the experimental material. From the data, the investigator 
can estimate the population standard deviation, from which he can in 
turn estimate the size of sample needed to ensure a specified degree of pre- 
cision in the sample averages. He can also look for a good method of 
forming pairs. Such a study is sometimes called a uniformity trial , be- 
cause the treatment is uniform, although a variability trial might be a 
better name. 

Youden and Beale found that the two halves of the same leaf were 
good partners, since they tended to give similar numbers of lesions. An 
indication of this fact is evident in table 4.3.1, where the pairs are arranged 
in descending order of total numbers of lesions per leaf. Notice that with 
two minor exceptions, this descending order shows up in each preparation. 
If one member of a pair is high, so is the other: if one is low, so is the other. 
The numbers on the two halves of a leaf are said to be positively correlated. 
Because of this correlation, the differences between the two halves tend 
to be mostly small, and therefore less likely to mask or conceal an im- 
posed difference due to a difference in treatments. 
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FXAMPLE 4 3 1-— L C Grove (2) determined the sample mean numbers of florets 
produced by seven pairs of plots of Excellence gladiolus, one plot of each pair planted 
with high (first year) corms, the other with low (second-year or older) conns (A corm is 
an underground propagating stem ) The plot means were as follows 


Corm Florets 


High 112 13 3 12 8 13 7 12 2 11 9 12 1 

low 146 126 150 156 12 7 120 13 1 


Calculate the sample mean difference Ans 1 2 florets In the population of such differ- 
ences test the null hypothesis //,> - 0 Ans P - 0 06, approximately 

FX AM PLE 4 3 2- Samples of blood were taken from each of 8 patients In each sam 
pie the serum ilbumen content of the blood was determined by each of two laboratoiv 
methods A and B The objective was to discover whether there was a consistent difference 
m the amount of strum albumen found by the two methods The 8 differences (A-B) were 
as follows 0 6 0 7 0 8, 0 9 0 3, 0 5, -0 5, 1 3, the units bemg gm per 100 ml Compute 
t to test the null hypothesis (H 0 ) that the population mean of these differences is zero, and 
report the approximate value of your significance probability What is the conclusion 9 
Ans / ~ 2 51 1 with Id/ P between 0 05 and 0 025 Method A has a systematic tendency to 
give higher v dues 

EXAMPIE 4 3 3— Mitchell, Burroughs, and Beadles (3) computed the biological 
values of proteins from raw peanuts (?) and roasted peanuts ( R ) as determined m an experi- 
ment with 10 pairs of rats The pairs of data P, R are as follows 61, 55, 60, 54, 56 47 
63 59 56 51 63 61 59,57 56,54 44,63,61,58 Compute the sample mean difference, 
2 0 and the sample standard deviation of the differences 7 72 units Since t - 0 82, 
over 40° 0 of similar samples from a population with n D = 0 would be expected to have 
larger f-values 

Note 9 of the 10 differences, P - R, are positive One would like some information 
about the next-to the-iast pair 44, 63 The first member seems abnormal While unusual 
individuals like this do occur in the most carefully conducted trials, their appearance de- 
mands immediate investigation Doubtless an error m recording or computation was 
searched for but not found What should be done about such aberrant observations *s a 
moot question their occurrence detracts from one’s confidence in the experiment 

EXAMPLE 4 3 A— A man starting work m a new town has two routes A and B by which 
he may drive home He conducts an experiment to find out which route is quicker Since 
traffic is unusually heavy on Mondays and Fridays but does not seem to vary much from 
week to week, he selects the day of the week as the basis for pairing The test lasts four weeks 
On the first Monday he tosses a com to decide whether to drive by route A or B On the 
second Monday he drives by the other route On the third Monday, he again tosses a com 
using the other route on the fourth Monday, and similarly for the other days of the week 
The times taken m minutes, were as follows 



Ml 

M2 

Tul 

Tu2 

W1 

W2 

Thl 

Th2 

FI 

F2 

A 

28 7 

26 2 

24 8 

25 3 

25 1 

23 9 

26 1 

25 8 

30 3 

31 4 

B 

25 4 

25 8 

24 9 

25 0 

23 9 

23 3 

26 6 

24 8 

28 8 

30 3 


OH resting the data as consisting of 10 pairs test whether there seems to be any real differ- 
ence m average driving times between A and B (n) Compute 95% confidence limits for the 
population mean difference What would you regard as the population m this trial 9 (in) 
By eye inspection of the results, does the pairing look effective 9 (iv) Suppose that on the 
last Friday (F2) there had been a fire on route B, so that the time taken to get home was 48 
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minutes Would you recommend rejecting this pair from the analysis'’ Give your reason 
Ans (i) t - 2 651 , with 9 df P about 0 03 Method B seems definitely quicker, (ii) 0 12 to 
1 63 nuns There really isn’t much difference (in) Highly effective 

4.4 — Conditions for pairing. The objective of pairing is to increase 
the precision of the comparison of the two procedures Identical twins 
are natural pairs Litter mates of the same sex are often paired success- 
fully, because they usually behave more nearly alike than do animals less 
closely related. If the measurement at the end of the experiment is the 
subject’s ability to perform some task (e g , to do well m an exam), sub- 
jects similar in natural ability and previous training for this task should 
be paired Often the subjects are tested at the beginning of the trial to 
provide information for forming pairs. Similarly, m experiments that 
compare two methods of treating sick persons, patients whose prognosis 
appears about the same at the beginning of the trial should be paired if 
feasible 

The variable on which we pair should predict accurately the per- 
formance of the subjects on the measurement by which the effects of the 
treatments are to be judged Little will be gained by pairing students on 
their I Q ’s if I Q is not closely related to ability to perform the particular 
task that is being measured m the experiment 

Self-pairing is highly effective when an individual’s performance is 
consistent on different occasions, but yet exhibits wide variation when 
comparisons are made from one individual to another If two methods 
of conducting a chemical extraction are being compared, the pair is likely 
to be a sample of the original raw material which is thoroughly mixed 
and divided into two parts 

Environmental variation often calls for pairing Two treatments 
should be laid down side by side in the field or on the greenhouse bench 
m order to avoid the effects of unnecessary differences in soil, moisture, 
temperature, etc Two plots or pots next to each other usually respond 
more nearly alike than do those at a distance As a final illustration, 
sometimes the measuring process is lengthy and at least partly subjective, 
as in certain psychiatric studies If several judges must be used to make 
the measurements for comparing two treatments A and B, each scoring a 
different group of patients, an obvious precaution is to ensure that each 
judge scores as many A patients as B patients Even if the patients were 
not originally paired, they could be paired for assignment to judges 

Before an experiment has been conducted, it is of course not possible 
to foretell how effective a proposed pairing will be m increasing precision 
However, from the results of a paired experiment, its precision may be 
compared with that of the corresponding unpaired experiment (section 
4 11) 

4.5 — Tests of other null hypotheses about n The null hypothesis 
H D = 0 is not the only one that is useful, and the alternative may be fi D > 0 
instead of /i D # 0 Illustrations are found in a Boone County survey of 
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corn borer effects. On 14 farms, the effect of spraying was evaluated by 
measuring the corn yield from both sprayed and unsprayed strips in each 
field. The data are recorded in table 4.5.1. The sample mean difference 
is 4.7 bu./acre with s D = 6.48 bu./acre and s» = 6.48/ y / 14 = 1.73 bu./acre. 

A one-tailed t-test. It had already been established that the spray, at 
the concentration used, could not decrease yield. If there is a decrease, as 
in the first field, it must be attributed to causes other than the spray, or to 
sampling variation. Consequently if p D is not zero then it must be greater 


TABLE 4.5.1 

Yields of Corn (Bushels Per Acre) in Sprayed and Unsprayed Strips of 14 Fields 
Boone County, Iowa, 1950 


Sprayed 

64.3 

78.1 

93.0 

80.7 

89.0 

79 9 

90.6 

102.4 

Unsprayed 

70.0 

74.4 

86.6 

79.2 

84.7 

75.1 

87 3 

98 8 

Difference 

-5 7 

3.7 

6.4 

1.5 

4.3 

48 

3.3 

3.6 


70.7 

106.1 

107.4 

74.0 

72.6 

69.5 

70.2 

101.1 

83.4 

65.2 

68.1 

68 4 

0.5 

5.0 

24.0 

8.8 

4.5 

1.1 


than zero. The objective of this experiment was to test H Q :p D — 0 with 
H a \\l d > 0. As before, 

t = = 2.72, d.f. = 13 

To make a one-tailed test with table A 4 , locate the sample value of t 
and use half of the probability indicated. 

Applying this rule to the t = 2.72 above, P is slightly less than 0.02/2 ; 
the null hypothesis is rejected at P < 0.01. Evidently spraying did decrease 
corn borer damage, resulting in increased yields in Boone County in 1950. 

Test of a non-zero p. This same Boone County experiment may be 
cited to illustrate the use of a null hypothesis different from p D = 0. This 
experiment might have had as its objective the test of the null hypothesis, 
“The cost of spraying is equal to the gain from increased yield.” To 
evaluate costs, the fee of commercial sprayers was $3 per acre and the 
1950 crop was sold at about $1.50 per bushel. So 2 bushels per acre would 
pay for the spraying. This test would be H 0 : p D = 2 bu./acre, H A :p D ^2 
bu. acre, resulting m 


4.7 - 2.0 

t = — — - = 1.56, df = 13 
1.73 
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The two-tailed probability is about P = 0. 1 5, and the null hypothesis 
would presumably not be rejected. The verdict of the test is inconclusive: 
it provides no strong evidence that the farmers will either gain or lose by 
spraying. 

One-tailed test of a non-zero /x. It is possible that H 0 : n D =* 2 bu./acre 
might be tested with H A :fi D >2 bu./acre ; that is, the alternative hypothesis 
might be put in the form of a slogan, “It pays to spray.” If this were done, 
t = 1.56 would be associated with P = 0.15/2 = 0.075, not significant. 
But the implication of this one-sided test is that H 0 would be accepted 
no matter how far the sample mean might fall short of 2 bu./acre. It is 
the two tailed test which is appropriate here. 

This point is stressed for the reason that some people use the one- 
sided test because, as a man said, “I am not interested in the other alterna- 
tive.” A one-tailed test of H 0 :fi D = /i 0 against H A \jx D > fi 0 should be 
used only if we know enough about the nature of the process being studied 
to be certain that n D could not be less than /i 0 . 

In considering the profitability of spraying, it is more informative to 
treat the statistical problem as one of estimation than as one of testing 
hypotheses. Since the mean difference in yield between sprayed and un- 
sprayed strips is 4.7 bu. per acre, the sample estimate of the profit per acre 
due to spraying is 2.7 bu. We can compute confidence limits for the 
average profit per acre over a population of fields of which this is a random 
sample. For 90% limits we add and subtract t 910 s s = (1.771)(1.73) = 3.1 
bu. Thus if the farmers are willing to take a 1 -in- 1 0 chance that the sample 
estimate was not exceptionally poor, they learn that the average profit per 
acre lies somewhere between -0.4 bu. and +5.8 bu. These limits are 
unfortunately rather wide for a practical decision: a larger sample size 
would be necessary to narrow the limits. They do indicate, however, that 
although there is the possibility of a small loss, there is also the possibility 
of a substantial profit. The 95% limits, — 1 .0 bu. and + 6.4 bu., tell much 
the same story. 

EXAMPLE 4.5.1— In an investigation of the effect of feeding 10 meg. ot vitamin B u 
per pound of ration to growing swine (4), 8 lots (each with 6 pigs) were fed in pairs, the 
pairs were distinguished by being fed different levels of aureomycin, an antibiotic which 
did not interact with the vitamin ; that is, the differences were not affected by the aureomycin. 
The average daily gains (to about 200 lbs. live weight) are summarized as follows : 



Pairs of Lots 

Ration 

i 

2 

3 

4 

5 

6 

7 

8 

With B 12 

1.60 

1.68 

1.75 

1.64 

1.75 

1.79 

1.78 

1.77 

Without B 12 

1.56 

1.52 

1.52 

1.49 

1.59 

1.56 

1.60 

1.56 

Difference, D 

0.04 

0.16 

0.23 

0.15 

0.16 

0.23 

0.18 

0.21 


For the differences, calculate the statistics, 5 = 0.170 Ib./day and jj = 0.0217 lb./day. 
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EXAMPLE 4.5.2 — It is known that the addition of small amounts of the vitamin can- 
not decrease the rate of growth. While it is fairly obvious that D will be found significantly 
different from zero, the differences being all positive and, with one exception, fairly con- 
sistent, you may be interested m evaluating t. Ans. 7.83, far beyond the 0.01 level m the 
table. The appropriate alternative hypothesis is p. > 0. 

EXAMPLE 4.5.3 — The effect ofB 12 seems to be a stimulation of the metabolic processes 
including appetite. The pigs eat more and grow faster. In the experiment above, the cost 
of the additional amount of feed eaten, including that of the vitamin, corresponded to about 
0.130 lb./day of gain. Test the hypothesis that the profit derived from feeding B 12 is zero. 
Ans t — 1 84, P = 0.1 1 (two-sided alternative) 

4.6 — Comparison of the means of two independent samples. When no 
pairing has been employed, we have two independent sample with means 
X u X 2 -> which are estimates of their respective population means /i i5 p 2 . 
Tests of significance and confidence intervals concerning the population 
difference, p x — /t 2 , are again based on the /-distribution, where t now has 
the value 

— X 2 ) — (fi l — p 2 ) 

It is assumed that X l and X 2 are normally distributed and are independent. 
By theory, their difference is also normally distributed, so that the 
numerator of t is normal with mean zero. 

The denominator of / is a sample estimate of the standard error of 
(X x - X 2 ). The background for this estimate is given in the next two sec- 
tions. First, we need an important new result for the population variance 
of a difference between any two variables X x and X 2 . 

— 2 _ 2 [ _ 2 

G X x -X 2 — °Xi + a x 2 

The variance of a difference is the sum of the variances. This result holds 
for any two variables, whether normal or not, provided they are inde- 
pendently distributed. 

4.7 — The variance of a difference. A population variance is defined 
(section 2.12) as the average, over the population, of the squared devia- 
tions from the population mean. Thus we may write 

<Ty,-y 2 2 = Avg. of {(X x - X 2 ) - - /i 2 )} 2 


But, 


(. X x - x 2 ) - Of, - fi 2 ) = (X, -ftj- (X 2 - n 2 ) 

Hence, on squaring and expanding, 

!( x , - x 2 ) - - nz)} 2 = (X, - H\) 2 + (x 2 - ii 2 f 

- 2{X, - n,)(X 2 - fi 2 ) 

Now average over all pairs of values X u X 2 that can be drawn from 
their respective populations. By the definition of a population variance, 
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Avg. of (A, - n^ 2 = a x 2 
Avg. of (X 2 - M 2) 2 = a Xi 2 
This leads to the general result 

^-Yi 2 = ff -Yi 2 + a x 2 ~ 2 Avg. of (X x — /z 1 )(A 2 — Mi) (4.7.1) 

At this point we use the fact that A t and X 2 are independently drawn. 
Because of this independence, any specific value of A, will appear with 
all the values of A 2 that can be drawn from its population. Hence, for 
this specific value of X x , 

Avg. of(X x - Mt)(X 2 - Mi) = (A, - Mi){Avg. of (A 2 - Mi)} 

« 0 

since Mi is the mean or average of all the values of X 2 . It follows that the 
overall average of the cross-product term (A, — Mi)( A 2 - M 2 ) is zero, so 
that 


tf.Yi-.Yj 2 = «r. Yl 2 + <r Xl 2 (4.7.2) 

Apply this result to two means X 2 , drawn from populations with 
variance a 2 With samples of size rt, each mean has variance a 2 jn. This 
gives 

fft.-f* 2 = 2 a 2 In 

The variance of a difference is twice the variance of an individual mean. 

If a is known, the preceding results provide the material for tests and 
confidence intervals concerning /x, - Mi- To illustrate, from the table of 
pig gains (table 3.2.1) which we used to simulate a normal distribution 
with a = 10 pounds, the first two samples drawn gave = 35.6 and 
% 2 = 29.3 pounds, with n = 10. Since the standard error of A t — X 2 is 
J2a/Jn, the quantity 

Z = >{(*! - X 2 ) - (Ml - Mi)}l-Jl<* 
is a normal deviate. To test the null hypothesis that Mi= Mi we compute 

7 yMg 1 - A 2 ) yro'(6.3) 19.92 

V^ 10 ) 14.14 ’ 

From table A 3 a larger value of Z, ignoring sign, occurs about 16% of the 
tnals. As we would expect, the difference is not significant. The 95% 
confidence limits for {mi ~ Mi) are 

<*!- A 2 )± (1.96)72 a/y/n 

4.8 — A pooled estimate of variance. In most applications the value 
of a 2 is not known. However, each sample furnishes an estimate of a 2 : 
call these estimates s x 2 and s 2 2 . With samples of the same size n, the best 
combined estimate is their pooled average s 2 = (s x 2 + s 2 2 )/ 2. 
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Since sf = 'Lx 1 2 /(n - 1 ) _and s 2 2 = 'Lx 2 2 /(n - 1), where, as usual, 
x x — X x ~ X l and x 2 = X 2 — Z 2 , we may write 

2 = gV + Zx 2 2 

2(n - 1) 

This formula is recommended for routine computing since it is quicks 
and extends easily to samples of unequal sizes. 

The number of degrees of freedom in the pooled s 2 is 2 (n — 1), the 
sum of the d.f. in s x 2 and s 2 . This leads to the result that 

/ = Jn{(X x - X 2 ) - fa - n 2 )}/f2s 

follows Student’s /-distribution with 2 (n - 1) d.f. 

The preceding analysis requires one additional assumption, namely 
that <r is the same in the two populations. The situations in which this 
assumption is suspect and the comparison of X x and X 2 when the assump- 
tion does not hold are discussed in section 4.14. 

It is now time to apply these methods to a real experiment. 

43 — An experiment comparing two groups of equal size. Breneman 
(5) compared the 15-day mean comb weights of two lots of male chicks, 
one receiving sex hormone A (testosterone), the other C (dehydro- 
androsterone). Day-old chicks, 1 1 in number, were assigned at random 
to each of the treatments. To distinguish between the two lots, which 
were caged together, the heads of the chicks were stained red and purple 
respectively. The individual comb weights are recorded in table 4.9.1. 

The calculations for the test of significance are given at the foot of 
the table. Note that in the Hormone A sample the correction term 
(LX) 2 /n is (1>067) 2 /11 = 103,499. Note also the method recommended 
for computing the pooled s 2 . With 20 d.f. , the value of t is significant at 
the 1% level. Hormone A gives higher average comb weights than 
hormone C. The two sums of squares of deviations, 8,472 and 7,748, 
make the assumption of equal o 2 appear reasonable. 

The 95% confidence limits for (/ij - \xf) are 

X\ — X 2 ± t 0 05 Sx l -x 2 


or, in this example, 

41 - (2.086)(12.1) = 16 mg., and 41 + (2.086)(12.1) = 66 mg. 

EXAMPLE 4.9.1 — Lots of 10 bees were fed two concentrations of syrup, 20% and 
65%, at a feeder half a mile from the hive (6). Upon amval at the hive their honey sacs 
were removed and the concentration of the fluid measured. In every case there was a de- 
crease from the feeder concentration. The decreases were : from the 20% syrup, 0.7, 0.5, 0.4, 
0.7, 0.5, 0.4, 0.7, 0 4, 0.2, and 0.5; from the 65% syrup, 1.7, 2.8, 2.2, 1.4, 1.3, 2.1, 0.8, 3.4, 
1 .9, and 1 .4%. Here, every observation m the second sample is larger than any m the first, 
so that rather obviously <ii 2 . Show that t = 5.6 if fi x - fi 2 = 0. There is little doubt 
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TABLE 4.9.1 

Testing the Difference Between the Means of Two Independent Samples 


Weight of Comb (mgs.) 


Hormone 

Hormone 


A 

C 


57 

89 


120 

30 


101 

82 


137 

50 


119 

39 


117 

22 


104 

57 


73 

32 


53 

% 


68 

31 


118 

88 

Totals 

1,067 

616 

n 

11 

11 

means 

97 

56 

1.X 2 

111,971 

42,244 

(lX) 2 /n 

103,499 

34,496 

Tx 1 

8,472 

7,748 

df 

10 

10 

Pooled s 2 

8,472 + 7,748 

10 4- 10 

« 811, df * 20 



Imi) ,, .. 

/ n =1114mg - 


t « (J, - _ Tj - 41/12.14 - 3.38 


that, under the experimental conditions imposed, the concentration dunng flight decreases 
more with the 65% syrup. But how about equality of variances'* See sections 4.14 and 
4.15 for further discussion. 

EXAMPLE 4.9.2— -Four determinations of the pH of Shelby loam were made with 
each of two types of glass electrode (7). With a modified quinhydrone electrode, the read- 
ings were 5.78, 5.74, 5.84, and 5.80; while with modified Ag/AgCi electrode, they were 
5.82, 5.87, 5.96, and 5.89. With the hypothesis that p t - ju 2 * 0, calculate / = 2.66. Note: 
if you subtract 5.74 from every observation, the calculations are simpler. 

EXAMPLE 4.9.3— In experiments to measure the effectiveness of carbon tetrachloride 
as a worm-killer, each of 10 rats received an injection of 500 larvae of the worm, nippo- 
strongylus mum Eight days later 5 of the rats, chosen at random, each received 0.126 cc. 
of a solution of carbon tetrachloride, and two days later the rats were killed and the numbers 
of adult worms counted. These numbers were 378, 275, 412, 265, and 286 for the control 
rats and 123, 143, 192, 40, and 259 for the rats treated with CC1 4 . Find the significance 
probability for the difference in mean numbers of worms, and compute 95% confidence 
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limits for this difference. Ans. t = 3.64 with 8 d.f. P close to 0.01. Confidence limits are 
63 and 280. 

EXAMPLE 4.9.4 — Fifteen kernels of mature lodent com were tested for crushing 
resistance. Measured in pounds the resistances were: 50, 36, 34, 45, 56, 42, 53, 25, 65, 
33, 40, 42, 39, 43, 42. Another batch of 15 kernels was tested after being harvested in the 
dough stage: 43, 44, 51, 40, 29, 49, 39, 59, 43, 48, 67, 44, 46, 54, 64. Test the significance 
of the difference between the two means. Ans. t = 1 .38. 

EXAMPLE 4.9.5 — In reading reports of researches it is sometimes desirable to supply 
a test of significance which was not considered necessary by the author. As an example, 
Smith (8) gave the sample mean yields and their standard errors for two crosses of maize 
as 8.84 ± 0.39 and 7.00 ±0.18 grams. Each mean was the average of five replications. 
Determine if the mean difference is significant. Ans. t = 4.29, d.f = 8. P < 0.5%. To do 
this in the quickest way, satisfy yourself that the estimate of the variance of the difference 
between the two means is the sum of the squares of 0.39 and 0. 18, namely 0. 1845. 

4.10 — Groups of unequal sizes. Unequal numbers are common in 
comparisons made from survey data as, for example, comparing the mean 
incomes of men of similar ages who have master’s and bachelor’s degrees, 
or the severity of injury suffered in auto accidents by drivers wearing seat 
belts and drivers not wearing seat belts. In planned experiments, equal 
numbers are preferable, being simpler to analyze and more efficient, but 
equality is sometimes impossible or inconvenient to attain. Two lots of 
chicks from two batches of eggs treated differently nearly always differ in 
the number of birds hatched. Occasionally, when a new treatment is in 
short supply, an experiment with unequal numbers is set up deliberately. 

Unequal numbers occur also in experiments because of accidents and 
losses during the course of the trial. In such cases the investigator should 
always consider whether any loss represents a failure of the treatment 
rather than an accident that is not to be blamed on the treatment. Need- 
less to say, such situations require careful judgment. 

The statistical analysis for groups of unequal sizes follows almost 
exactly the same pattern as that for groups of equal sizes. As before, we 
assume that the variance is the same in both populations unless otherwise 
indicated. With samples of sizes n u n 2 , their means X 1 and X 2 have vari- 
ances G 2 /n 1 and a 2 /n 2 . The variance of the difference is then 

a 2 a 2 

1 ~ c7 2 (1 /n 1 4- l/n 2 ) = o 2 

n 1 n 2 

In order to form a pooled estimate of <r 2 , we follow the rule given for 
equal-sized samples. Add the sums of squares of deviations in the numer- 
ators of s x 2 and s 2 2 , and divide by the sum of their degrees of freedom. 
These degrees of freedom are (n x - 1) and (n 2 - 1), so that the de- 
nominator of the pooled s 2 is (n x 4- n 2 — 2). This quantity is also the 
number of d.f in the pooled s 2 . The procedure will be clear from the 
example in table 4.10.1. Note how closely the calculations follow those 
given in table 4.9.1 for samples of equal sizes. 
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, TABLE 4.10.1 

Analysis for Two Samples of Unequal Sizes. Gains in Weights of Two Lots 
of Female Rats (28-84 days old) Under Two Diets 


Gams (gins.) 



High Protein 

Low Protein 


134 

70 


146 

118 


104 

101 


119 

85 


124 

107 


161 

132 


107 

94 


83 



113 



129 



97 

i 


123 


Totals 

1440 

101 

'rt 

12 

1 

means 

120 

101 

IX 1 

177,832 

73,959 

(lX) 2 /n 

172,800 

71,407 

Zx 2 

5,032 

2,552 

dj: 

11 

6 

Pooled s 1 

5,032 + 2,552 _ ^ 

~ * 446.12, 

11+6 

dj - 17 


ix.-x, * - v{(* 46.12M19)/84} = 10.04 gms. 

V \ «l«2 / 

/ = 19/10.04 - 1.89, P about 0.08 

The high protein diet showed a slightly greater mean gam. Since P 
is. about 0.08, however, a difference as large the observed one would 
occur about 1 in 12 times by chance, so that the observed difference can- 
not be regarded as established by the usual standards in tests of sig- 
nificance. 

For evidence about homogeneity of variance in the two populations, 
observe that s x 2 - 5.032 11 = 457 and s z 1 — 2,552/6 = 425. 

If the investigator is more interested in estimates than ra tests, he may 
prefer the confidence interval. He reports an observed difference of 19 
gms. in favor of the high protein diet, with 95% confidence limits -2.2 
and 40.2 gms. 

EXAMPLE 4.10.WThe following are the rates of diffusion of carbon dioxide through 
two soils of different porosity (9). Through a fine soil (/)* 20, 31, 18, 23, 23, 28, 23, 26, 27, 
imi 17, 25* through a coarse soil (<) 19,30, 32, 28,15,26,35.18.25,27,35, 34 Show 
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that pooled s 2 — 35.83, — 2 40, d.j. = 23, and t — 1 67 The difference, therefore, 

is not significant 

EXAMPLE 4.10.2 — The total nitrogen content of the blood plasma of normal albino 
rats was measured at 37 and 1 80 days of age (10). The results are expressed as gms per 100 
cc, of plasma. At age 37 days, 9 rats had 0.98, 0.83, 0.99, 0.86, 0 90, 0 81, 0.94, 0 92, and 
0.87, at age 180 days, 8 rats had 1.20, 1.18, 1.33, 1.21, I 20, 1.07, l 13, and 1 12 gms. per 100 
cc. Since significance is obvious, set a 95% confidence interval on the population mean 
difference. Ans. 0.21 to 0.35 gms./ 100 cc 

EXAMPLE 4. 10.3 — Sometimes, especially in comparisons made from surveys, the two 
samples are large. Time is saved by forming frequency distributions and computing the 
means and variances as in section 3.11. The following data from an experiment serve as an 
illustration. The objective was to compare the effectiveness of two antibiotics, A and B, for 
treating patients with lobar pneumonia. The numbers of patients were 59 and 43. The data 
are the numbers of days needed to bring the patient’s temperature down to normal 


No. of Days 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Total 

No. of 

A 

, .. 

17 

8 

5 

9 

7 

1 

2 

1 

2 

7 

59 

Patients 

1 — i 1 

B 

! 15 

8 

8 

5 

3 

1 

0 

0 

0 

3 

43 


What are your conclusions about the relative effectiveness of the two antibiotics in bringing 
down the fever 9 Ans. The difference of about 1 day m favor of B has a P value between 0.05 
and 0.025, Note that although these are frequency distributions, the only real grouping 
is in the 10-day groups, which actually represented “at least 10” and were arbitrarily rounded 
to 10. Since the distributions are very skew, the analysis leans heavily on the Central Limit 
Theorem. Do the variances given by the two drugs appear to differ 9 

EXAMPLE 4.10,4 — Show that if the two samples are of sizes 6 and 12, the S.D of the 
difference in means is the same as when the samples are both of size 8. Are the d.f in the 
pooled s 2 the same? 

EXAMPLE 4 10.5 — Show that the pooled s 2 is a weighted mean of s x 2 and s 2 2 m which 
each is weighted by its number of df 


4.11 — Paired versus independent groups. The formula for the vari- 
ance of a difference throws more light on the circumstances m which 
pairing is effective. Quoting formula (4.7.1), 

= Ox, 2 + °x? - 2 Avg. of (X x - ix l )(X 2 - n 2 ) 

When pairing, we try to choose pairs such that if X 1 is high, so is X 2 . 
Thus, if (X { — p x ) is positive, so is (X 2 — /x 2 X and their product 
(X t - Hi)(X 2 - ju 2 ) is positive. Similarly, m successful pairing, when 
(X i — /ii) is negative, (X 2 — p 2 ) will usually also be negative. Their 
product (X 1 — Vi)(X 2 — fi 2 ), is again positive . For paired samples, then, 
the average of this product is positive. This helps, because it makes the 
variance of (X x — X 2 ) less than the sum of their variances, sometimes very 
much less. The average value of the product over the population is 
called the covariance of X x and X 2 , and is studied in chapter 7. The result 
for the variance of a difference may now be written 

<7 Xl -x z 2 = ff \ 2 + a x 2 ~ 2 Cov. (AV X 2 ) 
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Pairing is not always effective, because X x and X 2 may be poorly 
correlated. Fortunately, it is possible from theresujts of a paired experi- 
ment to estimate what the standard error of (X x - X 2 ) would have been 
if the experiment had been conducted as two independent groups. By 
this calculation the investigator can appraise the success of his pairing, 
which guides him in deciding whether the pairing is worth continuing in 
future experiments. 

With paired samples of size n , the standard error of the mean dif- 
ference D = X x — X 2 is o D fy/n, where a D is the standard deviation of the 
population of paired differences (section 4.3). For an experiment with 
two independent groups, the standard error of X x - X 2 is f2 a/fn, where 
a is the standard deviation of the original population from which we drew 
the sample of size 2 n (section 4.7). Omitting the yjn, the quantities that 
we want to compare are a D and a. Usually, the comparison is made in 
terms of variances : we compare a D 2 with la 2 . 

From the statistical analysis of the paired experiment, we have an 
unbiased estimate s D 2 of a D 2 . The problem is to obtain an estimate of 
2cr 2 . One possibility is to analyze the results of the paired experiment by 
the method of section 4.9 for two independent samples, using the pooled 
s 2 as an estimate of <r 2 . This procedure gives a good approximation when 
n is large, but is slightly wrong, because the two samples from which s 2 
was computed were not independent. An unbiased estimate of la 2 is 
given by the formula 


Id 2 « 2s 2 — (2s 2 - s d 2 )!(2 n - 1) 

(The ‘hat’ [ ~ ] placed above a population parameter is often used in mathe- 
matical statistics to denote an estimate of that parameter.) 

Let us apply this method to the paired experiment on virus lesions 
(table 4.3.1, p. 95), which gave s D 2 = 18.5T. You may venfy that the 
pooled s 2 is 45.714, giving 2s 2 = 91.43. Hence, an unbiased estimate of 
2<r 2 is 

la 2 — 91.43 - (91.43 - 18.57), 15 =86.57 

The pairing has given a much smaller variance of the mean difference, 
18.57/n versus 86.57/n. What does this imply in practical terms? With 
independent samples, the sample size would have to be increased from 8 
pairs to 8(86.57)/(18.57), or about 37 pairs, in order to give the same 
variance of the mean difference as does the paired experiment . The saving 
in amount of work due to pairing is large in this case. 

The computation overlooks one point. In the paired expenment, 
s D 2 has 7 d.f. whereas the pooled s 2 would have 14 d.f. for error. The 
/-value used m tests of significance or in computing confidence limits 
would be slightly smaller with independent samples than with paired 
samples. Several writers (11), (12), (13), have discussed the allowance 
that should be made for this difference in number of d.f. We suggest a 
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rule given by Fisher (12). Multiply the estimated variance by 
(f + 3 )/(/ + 1), where /is the df that the experimental plan provides. 
Thus we compare 

(18.57)(10)/8 = 23.2, with (86.57)(17)/(15) = 98.1 

D. R. Cox (13) suggests the multiplier (f -f l) 2 // 2 . This gives almost the 
same results, imposing a slightly higher penalty when /is small. 

From a single experiment a comparison like the above is not very 
precise, particularly if n is small. The results of several paired experi- 
ments in which the same criterion for pairing was employed give a more 
accurate picture of the success of the pairing. If the criterion has no cor- 
relation with the response variable, there is a small loss in accuracy from 
pairing due to the adjustment for d.f. There may even be a substantial 
loss in accuracy if the criterion is badly chosen so that members of a pair 
are negatively correlated. 

When analyzing the results of a comparison of two procedures, the 
investigator must know whether his samples are paired or independent 
and must use the appropriate analysis. Sometimes a worker with paired 
data forgets this when it comes to analysis, and carries out the statistical 
analysis as if the two samples were independent. This is a serious mistake 
if the pairing has been effective. In the virus lesions example, he would 
be using 2 s 2 /n or 91.43/8 = 11.44 as the variance of 15 instead of 
18.57/8 = 2.32. The mistake throws away all the advantage of the pair- 
ing. Differences that are actually significant may be found non-significant, 
and confidence intervals will be too wide. 

Analysis of independent samples as if they were paired seems to be 
rare in practice. If the members of each sample are in essentially random 
order, so that the pairs are a random selection, the computed s D 2 may be 
shown to be an unbiased estimate of 2o 2 . Thus the analysis still provides 
an unbiased estimate of the variance of (X t — X 2 ) and a valid /-test. 
There is a slight loss in sensitivity, since /-tests are based on (n — 1) d.f., 
instead of 2(/i — 1) df 

As regards assumptions, pairing has the advantage that its /- test does 
not require a x = o 2 . Random” pairing of independent samples has been 
suggested as a means of obtaining tests and confidence limits when the 
investigator knows that o 1 and o 2 are unequal. 

Artificial pairing of the results, by arranging each sample in descend- 
ing order and pairing the top two, the next two,_and so on, produces a 
great under-estimation of the true variance of D. This effect may be 
illustrated by the first two random samples of pig gains from table 3.3.1 
(p. 69). The population variance a 2 is 100, giving la 2 = 200. In table 
4.1 1.1 this method of artificial pairing has been employed. 

Instead of the correct value of 200 for 2<r 2 we get an estimate s D 2 of 
only 8.0. Since $$ = ^(8.0/10) = 0.894, the /-value for testing D is 
/ = 6.3/0.894 = 7.04, with 9 df. This gives a P value of much less than 
0.1%, although the two samples were drawn from the same population. 



TABLE 4.11 1 

Two Samples of 10 Pig Gains Arranged in Descending Order, to Illustrate 
the Erroneous Conclusions From Artificial Pairing 


Sample 1 

57 

53 

39 

39 

36 

34 

33 

29 

24 

12 

Mean » 35.6 

Sample 2 

53 

44 

32 

31 

30 

30 

24 

19 

19 

11 | 

Mean » 29.3 

Diff. 

4 

9 

7 

8 

6 

4 

9 

10 

5 

i ! 

j Mean *» 6.3 



Xrf 2 

469 

- (63) 2 /10 

= 72.1, V 

« 72.1/9 - 

8.0 



EXAMPLE 4.11.1 — In planning experiments to test the effects of two pain-deadeners 
on the ability of young men to tolerate pain from a narrow beam of light directed at the arm, 
each subject was first rated several times as to the amount of heat energy that he bore with- 
out complaining of discomfort. The subjects were then paired according to these initial 
scores. In a later experiment the amounts of energy received at the point at which the sub- 
ject complained were as follows, A and B denoting the treatments. 


Pair 

1 

2 

3 

4 

5 

6 

1 

8 

9 

l Sums 

A ! 

15 

2 

4 

1 

5 

7 

1 

0 

-3 

32 

B 

6 

7 

3 

4 

3 

2 

3 

0 

-6 

22 


To simplify calculations, 30 was subtracted from each original score. Show' that for ap- 
praising the effectiveness of the pairing, comparable variances are 22.5 for the paired experi- 
ment and 44.6 for independent groups (after allowing for the difference in df). The pre- 
liminary work in rating the subjects reduced the number of subjects peeded by almost one- 
half. 


EXAM PLE 4. 1 1 .2 — In a previous experiment comparing two routes A and B for driving 
home from an office (example 4.3.4), pairing was by days of the week. The times taken 
( - 23 mins.) for the ten pairs were as follows: 


A 

5.7 

3,2 

1.8 

2.3 

2.1 

0.9 

3.1 

2.8 

7.3 

8.4 

B 

| 

2.4 

2.8 

1.9 

2.0 

0.9 

0.3 

3.6 

1.8 

5,8 

7.3 

Diff. 

L 3 - 3 . 

0.4 

-0,1 

0.3 

1.2 

0.6 

—0.5 

1.0 

1.5 

1.1 


Show that if the ten nights on which route A was used had been drawn at random from the 
twenty nights available, the variance of the mean difference would have been about 8 times 
as high as with this pairing 

EXAMPLE 4 113 If pairing has not reduced the variance, so that % 2 - show 
that allowance foi the error d.f. by Fisher’s rule makes pairing 15° 0 less effective than inde- 
pendent groups when n- 5 and 9% less effective when n = 10. In small experiments, 
pairing is inadvisable unless a sizeable reduction m variance is expected, 

4.12 — Precautions against bias-randomization. With either inde- 
pendent or paired samples, the analysis assumes that the difference 
{X x _ x 2 ) is an unbiased estimate of the population mean difference 
between the two treatments. Unless precautions arc taken when con- 
ducting an experiment, (X x - X 2 ) may be subject to a bias of unknown 
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amount- that makes the conclusion false. Corner (14) describes an ex- 
ample in which, when picking rabbits out, of a hatch, one worker tended 
to pick large rabbits, another to pick small rabbits, although neither was 
aware of his personal bias. If the rabbits for treatment A are picked out 
first, a bias will be introduced if the final response depends on the weight 
of the rabbit. If the animals receiving treatment A are kept in one cage 
and those having B in another, temperature, draftiness, or sources of in- 
fection in one cage may affect all the animals receiving A differently 
from those receiving B. When the application of the treatment or the 
measurement of response takes considerable time, unsuspected time trends 
may be present, producing bias if all replicates of treatment A are pro- 
cessed first. The investigator must be constantly on guard against such 
sources of bias. 

One helpful device, now commonly used, is randomization. When 
pairs have been formed, the decision as to which member of a pair re- 
ceives treatment A is made by tossing a coin or by using a table of random 
numbers. If the random number drawn is odd, the first member of the 
pair will receive treatment A. With 10 pairs, we draw 10 random digits 
from table A 1, say 9, 8, 0, 1, 8, 3, 6, 8, 0, 3. In pairs 1, 4, 6, and 10, treat- 
ment A is given to the first member of the pair and B to the second member. 
In the remaining pairs, the first member receives B. 

With independent samples, random numbers are used to divide the 
2 n subjects into two groups of n. Number the subjects in any order from 
1 to 2 n. Proceed down a column of random numbers, allotting the sub- 
ject to A if the number is odd, to B if even, continuing until n A’s or n B’s 
have been allotted. With 14 subjects and the same random numbers as 
above, subjects 1, 4, 6, and 10 receive A and subjects 2, 3, 5, 7, 8, and 9 
receive B. Thus far we have allotted four A’s and six B’s, so that more 
random numbers must be drawn. The next two in the column are 1, 8. 
Subject 11 gets A and subject 12 gets B. Since seven B’s have been as- 
signed we stop, giving A to subjects 13 and 14. 

Randomization gives each treatment an equal chance of being al- 
lotted to any subject that happens to give an unusually good or unusually 
poor response, exactly as assumed in the theory of probability on which 
the statistical analysis is based. Randomization does not guarantee to 
balance out the natural differences between the members of a pair exactly. 
With n pairs, there is a small probability, 1/2”" 1 , that one treatment will be 
assigned to the superior member in every pair. With 10 pairs this prob- 
ability is about 0.002. If the experimenter can predict which is likely to 
be the superior member in each pair, he should try a more sophisticated 
design (chapter 11) that utilizes this information more effectively than 
randomization. Randomization serves primarily to protect against 
sources of bias that are unsuspected. Randomization can be used not 
merely in the allocation of treatments to subjects, but at any later stage in 
which it may be a safeguard against bias, as discussed in (11), (13). 

Both independent and paired samples are much used in comparisons 
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made from surveys. The problem of avoiding misleading conclusions is 
formidable with survey data (15). Suppose we tried to learn something 
about the value of completing a high school education by comparing, 
some years later, the incomes, job satisfaction, and general well-being of 
a group of boys who completed high school with a group from the same 
schools who started but did not finish. Obviously, significant differences 
found between the sample means may be due to factors other than the 
completion of high school in itself: differences in the natural abihtv and 
personal characteristics of the boys, m the parents* economic level and 
number of useful contacts, and so on. Pairing the subjects on their school 
performance and parents* economic level helps, but no randomization 
within pairs is possible, and a significant mean difference may still be due 
to extraneous factors whose influence has been overlooked. 

Remember that a significant /-value is evidence that the population 
means differ. Popular accounts are sometimes written as if a signifi- 
cant t implies that every member of population 1 is superior to every 
member of population 2. "The oldest child in the familv achieves more 
m science or in business." in fact, the two populations may largely 
overlap even though / is significant. 

4.13 — Sample size in comparative experiments. In planning an ex- 
periment to compare two treatments, the following method is often used 
to estimate the size of sample needed. The investigator first decides on a 
value d which represents the size of difference between the true effect of 
the treatments that he regards as important. If the true difference is as 
large as <>, he would like the experiment to have a high probability of 
showing a statistically significant difference between the treatment means. 
Probabilities of 0.80 and 0.90 are common. A higher probability, say 

0.95 or 0.99, can be set, but the sample size required to meet these severer 
specifications is often too expensive. 

This w ay of stating the aims in planning the sample size is particularly 
appropriate when (i) the treatments are a standard treatment and a new 
treatment that the experimenter hopes will be better than the standard, 
and (li) he intends to discard the new treatment if the experiment does not 
show it to be significant 1> superior to the standard In these circum- 
stances he does not mind dropping the new treatment if it is at most only 
slightly better than the standard, but he does not want to drop it, on the 
evidence of the experiment, if it is substamiallv superior The value of d 
measures his idea of a substantial true difference. 

In order to make the calculation the experimenter supplies: 

1. the value of 

2. the desired probability P ' of obtaining a significant result if the 
true difference is <>, 

3. the significance level a of the test, which mav be either one-tailed 
or two-tailed. 

Consider paued samples. Assume at first that <r D is known and that 
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the test is two-tailed. In our specification, the observed mean difference 
D — — X 2 is normally distributed about 6 with standard deviation 

This distribution is shown in figure 4.13.1, which forms the 
basis of our explanation. We have assumed 6 > 0. 
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Fig. 4.13.1 — Frequeuc> distribution of the mean difference D between two treatments. 


In order to be statistically significant, D must exceed Z a a D lyjn^ where 
Z % is the normal deviate corresponding to the two-tailed significance level 
oc. (For x = 0.01, 0.05, 0.10. the values of Z a are 2.576, 1.960, and 1.645, 
i respectively.) The vertical line in figure 4.13.1 shows the critical value. 

In our specification, the probability that D exceeds this vaj.ue must 
be P'. That is, this value divides the frequency distribution of D into an 
area P' on the right and (1 — P') on the left. Consider the standard 
normal curve, with mean 0 and S.Z). 1 . With P' > 1/2, the point at which 
the area on the left is ( 1 - P') is minus the normal deviate corresponding 
to a one-tailed significance level (1 — P'). This is the same as minus the 
normal deviate corresponding to a two-tailed significance level 2(1 - P’\ 
or in our notation to - Z 2(1 _ P>) . For instance, with P' = 0.9, this is the 
normal deviate -Z 0 2 , or - 1.282. 

Since D has mean S and S.D. a DI v tu the quantity _( 25 — S) (a D x n) 
follow s the standard normal curve. Hence, the value of D that is exceeded 
with probability P' is given by the equation 
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or, 

D ~ S — n 

It follows that our specification is satisfied if 

ZaPlx/yJn = 5 — Zui -P’)0o/\f n 


A look at figure 4.13.1 may help at this point. Write /? = 2(1 - F) and 
solve for n. 

n = (Z # + Zpfcr^/d 1 (4.13.0 

To illustrate, for a one-tailed test at the 5° 0 level with F - 0.90, we 
have Z a = 1.645, Z p = 1.282, giving n = 8.6a 0 2 /d 2 . Note that n is the 
size of each sample, the total number of observations being In. 

Formula (4.13.1) for n remains the same for independent samples, 
except that a D 2 is replaced by 2a 2 . 

The two-tailed case involves a slight approximation. In a two-tailed 
test, D in figure 4.13.1 is also significant if it is less than - Z,o n / s n . 
But with 6 positive, the probability that this happens is negligible in most 
practical situations. 

Table 4.13.1 presents the multipliers (Z* +• Z fi ) 2 that are most fre- 
quently used. 

When a D and a are estimated from the results of the experiment, 
f-tests replace the normal deviate tests. The logical basis of the argument 
remains the same, but the formula for n becomes an integral equation in 
calculus that must be solved by successive approximation. This equa- 
tion was given by Neyman (21) to whom this method of determining 
sample size is due. 

For practical purposes, the following approximation agrees well 
enough with the values of n as found from Neyman’s solution: 

1. Find to one decimal place by table 4.13.1. 

TABLE 4.13.1 

Multipliers of a D 2 /b 2 in Paired Samples, and of la 2 fd 2 in Indepindeni Samples, 
in Order to Determine the Si/e of Each Sampu 



Two-tailed Tests 

One-tailed Tests 



Level 

1 


Le\ el 


P' 

0.01 

0.05 

0 10 

| 0 01 

[ 

0 05 

0 10 

0.80 

11.7 

7.9 

6.2 

10 0 

6 2 

4 5 

0 90 

14.9 

10.5 

8.6 

, 13.0 

8 6 

6 0 

0 95 

17.8 

13.0 

10 8 

I 15.8 

10 8 

S 6 
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2. Calculate / the number of degrees of freedom supplied by an 
experiment of this size (rounding n x upwards for this step). 

3. Multiply n x in step 1 by (f + 3 )/(/ + 1). 

To illustrate, suppose that a 10% difference <5 is regarded as important 
and that P* = 0.80 in a two-tailed 5% test of significance. The samples 
are to be independent, and past experience has shown that a is about 6%. 
The multiplier for P' = 0.80 and a 5% two-tailed test in table 4. 1 3. 1 is 7.9. 
Since 2a 2 /S z = 72/100 = 0.72, n x = (7.9)(0.72) = 5.7. With a sample size 
of 6 in each group,/ = 10. Hence we take n = (13)(5.7)/1 1 = 6.7, which 
we round up to 7. 

Note that the experimenter must still guess a value of g d or cr. 
Usually it is easier to guess a. If pairing is to be used but is expected to 
be only moderately effective, take g d = yj 2 g , reducing this value if some- 
thing more definite is known about the effectiveness of pairing. This un- 
certainty is the chief source of inaccuracy in the process. 

The preceding method is designed to protect the investigator against 
finding a non-significant result and consequently dropping a new treat- 
ment that is actually effective, because his experiment was too small. The 
method is therefore most useful in the early stages of a line of work. At 
later stages, when something has been learned about the sizes of dif- 
ferences produced by new treatments, we may wish to specify the size of 
the standard error or the half-width of the confidence interval that will be 
attached to an estimated difference. 

For xample, previous small experiments have indicated that a new 
treatment gives an increase of around 20%, and g is around 7%. The in- 
vestigator would like to estimate this increase, in his next experiment, with 
a standard error of ± 2%. He sets y/2(7)/yj'n = 2, giving w = 25 in each 
group. This type of rough calculation is often helpful in later work. 

EXAMPLE 4 13.1 — In table 4.13 1, verify the multipliers given for a one-tailed test 
at the 1° 0 level with P' = 0.90 and for a two-tailed test at the 10°/ o level with P' = 0.80 

EXAMPLE 4 13 2 — In planning a paired experiment, the investigator proposes to 
use a one-tailed test of significance at the 5° 0 level, and wants the probability of finding a 
significant difference to be 0.90 if (i) d = 10° o , (n) d = 5° 0 How many pairs does he need 9 
In each case, give the answer if (a) c D is known to be 12° 0 , (b) a D is guessed as 12%, but a 
r-test will be used m the experiment Ans ( xa ) 13, (ih) 15, (i xa) 50. (n b) 52 

EXAMPLE 4 13.3— In the previous example, how many pairs would you guess to be 
necessarv if d = 2. 5% 9 The answer brings out the difficulty of detecting small differences 
m comparative experiments with variable data 

EXAMPLE 4 13 4 — If a D = 5, how many pairs are needed to make the half-width 
ot the 90° , confidence interval for the difference between the two population means =2? 
Ans n = 17 

4.14 — Analysis of independent samples when # <j 2 . The ordinary 
method of finding confidence limits and making tests of significance for 
the difference between the means of two independent samples assumes that 
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the two population variances are the same. Common situations in which 
the assumption is suspect are as follows : 

(1) When the samples come from populations of different types, as 
in comparisons made from survey data, in comparing the average values 
of some characteristic of boys from public and private schools, we might 
expect, from our knowledge of the differences in the two kinds of schools, 
that the variances will not be the same. 

(2) When computing confidence limits in cases in which the popula- 
tion means are obviously widely different. The frequently found result 
that a tends to change, although slowly, when fi changes, would make us 
hesitant to assume a, = o 2 - 

(3) With samples from populations that are marked!) skew. In 
many such populations the relation between a and ft is often relatively 
strong. 

When a x t* a 2y the formula for the variance of (K x - X 2 ) in inde- 
pendent samples still holds, namely. 


The two samples furnish unbiased estimates s, 2 of a x z and s 2 2 of a 2 . 
Consequently, the ordinary t is replaced by the quantity 

t' ~ (-Yj - Y 2 ) v /( V>/j t s 2 7»2* 

This quantity does -not follow Student’s /-distribution when fi x = ft 2 . Two 
different forms of the distribution of/', arising from different theoretical 
backgrounds, have been worked out, one due to Behrens (16) and Fisher 
(17), the other to Welch and Aspm (18), (22). Both require special tables, 
given in the references. The tables differ relatively little, the Behrens- 
Fisher table being on the whole more conservative, in the sense that slightly 
higher values of /' are required for significance. The following ap- 
proximation due to Cochran (19), which uses the ordinary /-table, is suffi- 
ciently accurate for our purposes. It is usually slightly more conservative 
than the Behrens-Fisher solution. 

Case 1: n x ~ n 2 . With n x = n 2 = iu the variance m the denominator 
of/' is (s x 2 4- s 2 2 )/rt. But this Li just 2s 2 In. where s 2 is the pooled variance. 
Thus, in this case, /' = /. The rule is: calculate / m the usual way, but 
give it (// - 1) d.f. instead of 2(n - 1). 

Case 2: n x ^ n 2 . Calculate i . To find its significance level, look up 
the significance levels of t in table A 4 for (n x ~~ 1) and (n 2 - 1 )d.f. Call 
these values t x and t 2 . The significance le\el of f is, approximately. 

(w x t x + ir 2 / 2 )/(H' t + w 2 ). where u { - « t , ir ; = s : 2 n 2 

The following artificial examples illustrates the calculations: A quick 
but imprecise method of estimating the concentration of a chemical in a 
vat has been de\ eloped. Eight samples from the vat are analyzed, as well 
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as four samples by the standard method, which is precise but slow. In 
comparing the means we are examining whether the quick method gives 
a systematic over- or underestimate. Table 4.14.1 gives the computations. 


TABLE 4 14 1 

A Test of (X t - X 2 ) When <t x # < r 2 
Concentration of a Chemical by Two Methods 


Standard 

Quick 

' 25 

23 

24 

18 

25 

22 

26 

28 


17 


25 


19 


16 

= 25 

X 2 = 21 

" i = 4 

n 2=8 

i t 2 = 0 67 

s 2 2 ~ 17.71 

s^/n^ 0.17 

s 2 2 /« 2 = 2.21 

f = 4/72.38 

= 2.60 

/[(3 d.f.) = 3 182 

t 2 ( 7 d.f) - 2.365 

/ o 05 = 5% level off' = {(0.17)(3 182) + (2.21 )(2.365)}/2 38 

= 2.42 



Since 2.60 > 2.42, the difference is significant at the 5% level; the quick 
method appears to underestimate. 

Approximate 95% confidence limits for (ji x — /i 2 ) are 

Xi — X 2 ± t'().05 S Xt-X2 

or m this example, 4 ± (2.42)(1.54) = 4 ± 3.7. 

The ordinary /-test with a pooled s 2 gives t — 1 .84, to which we would 
erroneously attribute 10 d.f. The /-test tends to give too few significant 
results when the larger sample has the larger variance, as in this example, 
and too many when the larger sample has the smaller variance. 

Sometimes, when it seemed reasonable to assume that a x = a 2 or 
when the investigator failed to think about the question in advance, he 
notices that s 2 and s 2 2 are distinctly different. A test of the null hy- 
pothesis that a x — <r 2 , given in the next section, is useful. If the null 
hypothesis is rejected, the origin of the data should be re-examined. This 
may reveal some cause for expecting the standard deviations to be dif- 
ferent. In case of doubt it is better to avoid the assumption that a 1 = er 2 . 

4.15 — A test of the equality of two variances. The null hypothesis is 
that s t 2 and s 2 2 are independent random samples from normal popula- 
tions with the same variance a 2 . In situations in which there is no prior 
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reason to anticipate inequality of variance, the alternative is a two-sided 
one: <r, # a 2 . The test criterion is F = s 1 2 /s 2 2 , where s, 2 is the larger 
mean square. The distribution of Fwhen the null hypothesis is true was 
worked out by Fisher (20) early in the 1920’s. Like x* and / it is one of the 
basic distributions in modern statistical methods. A condensed two- 
tailed table of the 5% significance levels of Fis table 4. 1 5. 1 . 


TABLE 4.15.1 

5% Level (Two-Tailed) of the Distribution of F 


h = d.f. for 



/.■ 

» d.f. for Larger Mean Square 




Square 

2 

4 

6 

8 

10 

12 

15 

20 

30 

00 

2 

39.00 

39.25 

39.33 

39.37 

39.40 

39.42 

39.43 

39.45 

39.46 

39.50 

3 

16.04 

15.10 

14.74 

14.54 

14.42 

14.34 

14.25 

14.17 

14.08 

13.90 

4 

10 65 

9.60 

9.20 

8.98 

8.84 

8.75 

8.66 

8.56 

8.46 

8.26 

5 

8.43 

7.39 

6.98 

6.76 

6.62 

6.52 

6.43 

6.33 

6.23 

6.02 

6 

7 26 

6.23 

5.82 

560 

5.46 

5,37 

5.27 

5.17 

-5.07 

4.85 

7 

6.54 

5.52 

5.12 

4.90 

4.76 

4.67 

4.57 

4.47 

4.36 

4.14 

8 

6.06 

5.05 

4.65 

4.43 

4.30 

4.20 

4.10 

4.00 

3.89 

3.67 

9 

5.71 

4.72 

4.32 

4.10 

3.96 

3.87 

3.77 

3.67 

3.56 

3.33 

io 

5.46 

4.47 

4.07 

3.85 

3.72 

3.62 

3.52 

3.42 

3.31 

3.08 

12 | 

5.10 

4.12 

3.73 

3.51 

3.37 

3.28 

3.18 

3.07 

2.96 

2.72 

15 

4.76 

3.80 

3.41 

3.20 

3.06 

2% 

2.86 

2 74 

2 64 

2.40 

20 I 

4.46 

3.51 

3.13 

2.91 

2.77 

2 68 

2.57 

246 

2 35 

2.09 

30 

4.18 

3.25 

2,87 

2.65 

2.51 

2.41 

2.31 

2.20 

2.07 

1.79 

00 

3.69 

2.79 

2.41 

2.19 

2.05 

1.94 

1.83 

1,71 

1 57 

1.00 


Use of the table is illustrated by the bee data in example 4.9. 1 . Bees 
fed a 65% concentration of syrup showed a mean decrease in concentra- 
tion of 1.9%, with s t 2 = 0.589, while bees fed a 20% concentration gave 
a mean decrease of 0.5% with s 2 2 = 0.027. Each mean square has 9 d.f. 
Hence 

F = 0.589/0.027 = 22.1 

In the row for 9 d.f. and the column for 9 df (interpolated between 8 and 
10) the 5% level of Fis 4.03. The null hypothesis is rejected. No clear 
explanation of the discrepancy in variances was found, except that it may 
reflect the association of a smaller variance with a smaller mean. The 
difference between the means is strongly significant whether the variances 
are assumed the same or not. 

Often a one-tailed test is wanted, because we know, in advance of 
seeing the data, which population will have the higher variance if the null 
hypothesis is untrue. The numerator of Fis s x 2 if cr x > cr 2 is the alterna- 
tive, and s 2 2 if <x 2 > °i is the alternative. Table A 14 presents one-tailed 
levels of F directly. 
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EXAMPLE 4 15 1 — Young examined the basal metabolism of 26 college women 
in two groups of n x = 15 and n 2 - 11 ; X x = 34.45 and X 2 = 33.37 cal./sq. m./hr. ; 

= 69.36, Zx 2 2 ~ 13.66. Test H 0 : = or 2 . Ans. F = 3.62 to be compared with F 0 05 

= 3.55. (Data from Ph.D. thesis, Iowa State University, 1940). 

Basal Metabolism of 26 College Women 
(Calories per square meter per hour) 


7 or More Hours of Sleep 6 or Less Hours of Sleep 


1 . 

35,3 

9. 

33.3 

1 . 

32.5 

7. 

34.6 

2. 

35.9 

10. 

33.6 

2. 

34.0 

8. 

33.5 

3. 

37.2 

11. 

37 9 

3. 

34.4 

9. 

33.6 

4. 

33.0 

12. 

35.6 

4. 

31.8 

10. 

31.5 

5. 

31.9 

13. 

29.0 

5. 

35.0 

11. 

33.8 

6. 

7. 

8. 

33.7 

36.0 

35.0 

14. 

15. 

ZY, 

33.7 

35.7 

= 516.8 

6. 

34.6 

zy 2 = 

369.3 


X, as 34.45 cal/sq m./hr. X 2 = 33.57 cal./sq. m./hr 


EXAMPLE 4.15.2- In the metabolism data there is little difference between the group 
means, and the difference in variances can hardly reflect a correlation between variance and 
mean. It might arise from non-random sampling, since the subjects are volunteers, or it 
could be due to chance, since Fis scarcely beyond the 5% level As an exercise, test the differ- 
ence between the means (l) without assuming a x = a 2 , (n) making this assumption. Ans. 
(i) t' * 1.31,/ o os = 2 17, (n) t = 1.19, t 0 05 = 2.048. There is no difference m the conclu- 
sions. 

EXAMPLE 4.15.3 — In the preceding example, show that 95% confidence limits for 
tx x — p 2 are - 0.58 and 2.34 if we do not assume o l == <x 2 , and - 0.63 and 2.39 if this assump- 
tion is made. 

EXAMPLE 4.15.4 — If you wanted to test the null hypothesis <j x — a 2 from the data in 
table 4. 14. 1 , would you use a one-tailed or a two-tailed test 9 
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★ CHAPTER FIVE 


kJhortcut and non-parametric 

methods 


5.1 — Introduction. In the preceding chapter you learned how to com- 
pare the means of two samples: paired or independent. The present 
chapter takes up several topics related to the same problem. For some 
years there has been continued activity in developing rapid and easy 
methods for dealing with samples from normal populations. In small 
samples, we saw that the range, as a substitute for the sample standard 
deviation, has remarkably high efficiency as compared to s. In section 
5.2 a method will be described for comparing the means of two samples, 
using the range in place of s. Often this test, which is quickly made, leads 
to definite conclusions, so that there is no necessity to compute Student’s 
t. This range test may also be employed as a rough check when there is 
doubt whether t has been computed, correctly. 

To this point the normal distribution has been, taken as the source 
of most of our sampling. Fortunately, the statistical methods described 
are also effective for moderately anormal populations. But there is much 
current interest in finding methods that work well for a wide variety of 
populations. Such methods, sometimes called distribution-free methods, 
are needed when sampling from populations that are far from normal. 
They are useful also, particularly in exploratory research, when the in- 
vestigator does not know much about the type of distribution being 
sampled. The best-known procedures of this type are described in sec- 
tions 5.3 to 5.7. 

5.2 — The f-tes t based on range. Lord (3) has developed an alternative 
to the /-test in which the range replaces s% in the denominator of t. This 
test is used in the same way as t for testing a hypothesis or making in- 
terval estimates. Pillai (4) has shown that for interval estimates the effi- 
ciency of this procedure relative to t stays above 95% in samples up to 
n = 20. Like r, the range test assumes a normal distribution. It has 
become popular, particularly in industrial work. 

Table A 7 (i) applies to single samples or to a set of differences ob- 
tained from paired samples. The entries are the values of (X — n)/w 9 
where w denotes the range of the sample. This ratio will be called t w , 
sincd it plays the role of /. 

120 
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For an illustration of the setting of confidence intervals by means of 
Lord’s table, we use the vitamin C data from chapter 2. The sample 
valueswere 16, 22, 21, 20, 23, 21, 19, 15, 13, 23, 17, 20, 29, 18, 22, 16, 25, 
with X = 20. We find w = 29 - 13 = 16 mg./100 gm., with n = 17. 
Table A 7 (i) has the entry 0.144 in the column headed 0.05 and the row 
for n = 17. The probability that |f w J < 0.144 is 0.95 in random samples 
of n = 17 from a normally distributed population. The 95% confidence 
interval for n is fixed by the inequalities 

X - t w w £ fi < X + t w w 
Substituting the vitamin C data, 

20 - (0.144)(16) <(1^20 + (0.144) (16) 

17.7 < (i < 22.3 mg./100 gm. 

This is to be compared with the slightly narrower interval 1 7.95 22.05 

based on s. 

The test of a null hypothesis by means of t w is illustrated by the paired 
samples in chapter 4 showing the numbers of lesions on the two halves 
of tobacco leaves under two preparations of virus. The eight differences 
between the halves were 1 3, 3, 4, 6, -1, 1, 5, 1. Here the mean difference 
D — 4, while w = 14 and n = 8. For the null hypothesis that the two 
preparations produce on the average equal numbers of lesions, 


which is practically at the 5% level (0.288). The ordinary r-test gave a 
significance probability of about 4%. 

Table A 7 (ii) applies to two independent samples of equal size. The 
mean of the two ranges^ iv = (wj + vv 2 )/ 2 , replaces the w of the preceding 
paragraphs and X t — X 2 takes the place of D. 

The test of significance will be applied to the numbers of worms 
found in two samples of five rats, one sample treated previously by a worm 

k * 1,er ' TABLE 5.2.1 

Number of Worms Per Rat 


Treated Untreated 


123 378 

143 275 

192 412 

40 265 

259 286 


Means, X 151.4 323.2 


Ranges, w 


219 


147 
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We have X^- X x = 171.8 and w - (219 4- 147)/2 = 183. From this, 
t w = (X 2 — X x )/w — 171.8/183 = 0.939, which is beyond the 1% point, 
0.896, shown m table A 7 (ii) for n = 5, 

To find 95% confidence limits for the reduction m number of worms 
per rat due to the treatment, we use the formula 


(X 2 ~ X x ) - t w w < n 2 - fi x < (X 2 - X x ) + t w w 
171.8 - (0.613)(183) < ^ < 171 8 4- (0.613)(183) 

60 < fi 2 — fi x < 284 


The confidence interval is wide, owing both to the small sample sizes and 
the high variability from rat to rat. Student’s f, used m example 4.9.3 
for these data, gave closely similar results both for the significance level 
and the confidence limits. 

For two independent samples of unequal sizes, Moore (1) has given 
tables for the 10%, 5%, 2%, and 1% levels of Lord’s test to cover all cases 
in which the sample sizes n x and n 2 are both 20 or less. 

The range method can also be used when the sample size exceeds 20. 
With two samples each of size 24, for example, each sample may be divided 
at random into two groups of size 12. The range is found for each group, 
and the average of the four ranges is taken. Lord (3) gives the necessary 
tables. This device keeps the efficiency of the range test high for samples 
greater than 20, though the calculation takes a little longer. 

To summarize, the range test is convenient for normal samples if a 
5% to 10% loss in information can be tolerated. It is much used when 
many routine tests of significance or calculations of confidence lirqits have 
to be made. It is more sensitive than t to skewness in the population and 
to the appearance of gross errors. 

EXAMPLE 5.2 1 — In a previous example the differences m the serum albumen found 
by two methods A and B in eight blood samples were 0.6, 0 7, 0 8, 0 9, 0 3, 0 5, -0 5, 1 3 
gm. per 100 ml Apply the range method to test the null hypothesis that there is no consistent 
difference m the amount of serum albumen found by the two methods Ans t w = 0 32 
P < 0.05 

EXAMPLE 5 2 2 — In this example, given by Lord (3), the data are the times taken for 
an aqueous solution of glycerol to fall between two fixed marks In five independent determi- 
nations in a viscometer, these times were 103 5, 104 1, 102 7, 103 2, and 102 6 seconds 
For satisfactory calibration of the viscometer, the mean time should be accurate to within 
± 1/2 sec., apart from a l-in-20 chance By finding the half-width of the 95% confidence 
I interval for p by (l) the t w method, and (n) the t method, venfy whether this requirement is 
satisfied. Ans No Both methods give ±0 76 for the half-width 

EXAMPLE 5 2 3 — In 15 kernels of corn the crushing resistance of the kernels, m 
pounds, ranged from 25 to 65 with a mean of 43 0 Another sample of 1 5 kernels, harvested 
at a different stage, ranged from 29 to 67 with a mean of 48 0 Test whether the difference 
between the means is significant Ans No, t w - 0 128 Note that since the ranges of the 
two samples indicate much overlap, one could guess that the test will not show a significant 
difference 
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5.3 — Median, percentiles, and order statistics. The median of a popu- 
lation has the property that half the values in the population exceed it 
and half fall short of it. To estimate the median from a sample, arrange 
the observations in increasing order. When the sample values are ar- 
ranged in this way, they are often called the 1st, 2ml, 3rd . . . order sta- 
tistics. If the sample size n is odd, the sample median is the middle term 
in this array. For example, the median of the observations 5, 1, 8, 3, 4 
is 4. In general, (n odd) the median is the order statistic whose number is 
(n + l)/2. With n even, there is no middle term, and the median is de- 
fined as the average of the order statistics whose numbers are nf 2 ami 
(n + 2)12. The median of the observations 1, 3, 4, 5, 7, 8 is 4.5. 

Like the mean, the median is a measure of the middle of a distribution. 
If the distribution is symmetrical about its mean, the mean and the 
median coincide. With highly skewed distributions like that of income 
per family or annual sales of firms, the median is often reported, because 
it seems to represent people’s concept of an average better than the mean. 
This point can be illustrated with small samples. As we saw, the median 
of the observations 1, 3, 4, 5, 8 is 4, while the mean is 4.2. If the sample 
values become 1, 3, 4, 5, 24, where the 24 simulates the introduction of a 
wealthy family or a large firm, the median is still 4, but the mean is 7 . 4 . 
Four of the five sample values now fall short of the mean, while only one 
exceeds it. Similarly, in the distribution of incomes per family in a 
country, it is not unusual to find that 65% of families have incomes below 
the mean, with only 35% above it. In this sense, the mean does not seem 
a good indicator of the middle of the distribution. Further, the sample 
median in our small sample is still 4 even if we do not know the value of 
the highest observation, but merely that it is very large. With this sample, 
the mean cannot be calculated at all. 

The calculation of the median from a large sample is illustrated from 
the data in table 53.1. This shows for 179 records of cows, the number 
of days between calving and the resumption of the oestrus cycle (16). 
Many of the records are repeated observations from successive calvings of 
the same cow. This raises doubts about the conclusions drawn, but the 
data are intended merely for illustration. 


TABLE 5 3 1 

Distribution of Number of Days From Calving to First Subsequent Oestrus 
for a Holstein-Friesian Herd in Wisconsin 


Class limits 
(days) 

i °5“ 
20 5 

20 5- 
405 

40 5- 
605 

60 5- 
*805 

80S- 
100 5 

100 5- 
120 5 

120 5~ 
140 5 

140 5- 
1605 

160 5- 
180 5 

180 5- 
200,5 

200 5- 
2205 

Frequency 

8 

33 

50 

32 

15 

20 

11 

6 

2 

1 


Cumulative 

frequency 

8 

41 

91 

123 

138 

158 

169 

175 

177 

178 

179 
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The frequency rises to a peak in the class from 40.5 days to 60.5 days. The 
day corresponding to the greatest frequency was called the mode by Karl 
Pearson, There is a secondary mode in the class from 100.5 to 120.5 days. 
This bunodal feature, as well as the skewness, emphasizes the non- 
nounality of the distribution. 

Since n = 179, the sample median is the order statistic that is 90th 
from the bottom. To find this, cumulate the frequencies as shown m the 
table until a cumulated frequency higher than 90 is reached — in this case 
91. It is clear that the median is very close to the top of the 40.5-60.5 days 
class. The median is found by interpolation. Assuming that the 50 
observations in this class are evenly distributed between 40.5 and 60.5 
days, the median is 49/50 along the interval from 40.5 days to 60.5 days. 
The general formula is 


M = X L + 


gi 


(5.3.1) 


where 


X L — value of X at lower limit of the class containing the median 
- 40.5 days 

g = order statistic number of the median minus cumulative fre- 
quency up to the upper limit of previous class = 90 — 41 = 49 
1 = class interval = 20 days 
f = frequency in class containing the median = 50 

This gives 

(49(20) 

M = Median = 40.5 -f 1 = 60 days 


The mean of the distribution turns out to be 69.9 days, considerably 
higher than the median because of the long positive tail. 

In large samples of size n from a normal distribution (6), the sample 
median becomes normally distributed about the population median 
with standard error 1.253«r/^. For this distribution, in which the 
sample mean and median are estimates of the same quantity, the median 
is less accurate than the mean. As we have stated, however, the chief 
application of the median lies in non-normal distributions. 

There is a simple method of calculating confidence limits for the 
population median that is valid for any continuous distribution. Two of 
the order statistics serve as the upper and lower confidence limits. These 
are the order statistics whose numbers are, approximately (7), 


2 


(5.3.2) 
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where z is the normal deviate corresponding to the desired confidence 
probability. For the sample of cows, using 95% confidence probability, 
z = 2 and these numbers are 90 ±^179 =77 and 103. The 95% confi- 
dence limits are the numbers of days corresponding to the 77th and the 
103rd order statistics. The actual numbers of days are found by adapting 
formula 5.3.1 for the median. 


For 77: No. of days = 40.5 + ^ 

For 103: No. of days = 60.5 + 


55 days 
68 days 


The population median is between 55 and 68 days unless this is one of 
those unusual samples that occur about once in twenty trials. The reason- 
ing behind this method of finding confidence limits is essentially that by 
which confidence limits were found for the binomial in chapter 1 . For- 
mula 5.3.2 for finding the two-order statistics is a large-sample approxima- 
tion, but is adequate for practical purposes down to n = 25. 

In reporting on frequency distributions from large samples, investi- 
gators often quote percentiles of the distributions. The 90th percentile of 
a distribution of students’ I.Q. scores is the I.Q. value such that 90% of 
the students fall short of it and only 10% exceed it. 

In estimating percentiles, a useful result (7) is that in any continuous 
frequency distribution the Pth percentile is estimated by the order statistic 
whose number is (n + 1)P/100. For the 179 cows, the 90th percentile is 
estimated by order statistic whose number is i = (180)(90)/100 = 162. 
By again using formula 5.3.1, the number of days corresponding to the 
162nd order statistic is found as 

120.5 + (4)(20)/ll = 128 days 

EXAMPLE 5 3 1 — Front a sample whose values are 8, 9, 2, 7, 3, 12, 15, estimate (i) the 
median, (n) the lower qmrtile of the population (the lower quartile is the 25tb percentile, 
having one-quarter of the population below it and three-quarters above), (m) the 80th per- 
centile. Ans (i) 8, <ii) 3, (m) 1 3.2. For the 80th percentile, the number of the order statistic 
is 6 4. Since the 6th and 7th order statistics have values 12 and 15, respectively, linear 
interpolation gives 13 2 for the 6 4th order statistic. Note that from this small sample we 
cannot estimate the 90th percentile, beyond saying that our estimate exceeds 15. 

5.4— The sign test. Often there is no scale for measuring a character, 
yet one believes that he can distinguish grades of ment. The animal 
husbandman, for example, judges body conformation, ranking the in- 
dividuals from high to low, then assigning ranks 1, 2, ... n. In the same 
way, the foods expert arrays preparations according to flavor or palat- 
ability. If rankings of a.set of individuals or treatments are made by a 
random sample of judges, inferences can be made about the ranking in 
the population from which the sample of judges was drawn; this despite 
the fact that the parameters of the distributions cannot be written down. 
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First consider the rankings of two products by each of m judges. As 
an example, m — 8 judges ranked patties of ground beef which had been 
stored for 8 months at two temperatures in home freezers (17). Flavor was 
the basis of the ranking. Eight of the patties, one for each judge, were kept 
at 0°F.; the second sample of 8 were in a freezer whose temperature 
fluctuated between 0° and 15°F. The rankings are shown in table 5.4.1. 


TABLE 5.4.1 

Rankings of the Flavor of Pairs of Patties of Ground Beef 
(Eight judges. Rank 1 is high; rank 2, low) 


Judge 

Sample 1 Sample 2 

0°F. Fluctuated 

A 

1 2 

B 

i 1 2 

C 

1 2 1 

D 

1 2 

E 

1 2 

F 

1 2 

G 

1 2 

H 

1 2 


There are two null hypotheses that might be considered for these 
data. One is that the fluctuation in temperature produces no detectable 
difference in flavor. (If this hypothesis is true, however, one would expect 
some of the judges to report that their two patties taste alike and to be un- 
willing to rank them.) A second null hypothesis is that there is a dif- 
ference in flavor, and that in the population from which the judges were 
drawn, half the members prefer the patties kept at 0°F. and half prefer 
the other patties. Both hypotheses have the same consequence as regards 
the experimental data — namely, that for any judge in the sample, the 
probability is 1/2 that the 0°F. patty will be ranked 1. The reasons for 
this statement are different in the two cases. Under the first null hy- 
pothesis, the probability is 1/2 because the rankings are arbitrary; under 
the second, because any judge drawn into the sample has a probability 
1/2 of being a judge who prefers the 0°F. patty. 

In the sample, 7 out of 8 judges preferred the 0°F. patty. On either 
null hypothesis, we expect 4 out of 8. The problem of testing this hy- 
pothesis is exactly the same as that for which the x 2 test was introduced 
in sections 1.10, 1.11 of chapter 1. From the general formula in section 
1 . 12 , 


1 


2 


(7 - 4) 2 (1 - 4) 2 

4 + 4 


= 4.5 


When testing the null hypothesis that the probability is 1/2, a slightly 
simpler version of this formula is 
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n 8 

where a and b are the observed numbers in the two classes (0°F. and 
Fluctuated). 

Since the sample is small, we introduce a correction for continuity, 
described in section 8.6, and compute x 2 as 


,2 _ (\ a ~ b \ - !) 2 


(6 - l ) 2 
8 


« 3.12, 


P = 0.078 


The expression \a — b\ — 1 means that we reduce the absolute value of 
(a — Z?) by 1 before squaring. The test indicates non-significance, though 
the decision is close. 

In this example we used the test, in place of the /-test for paired 
samples, because the individual observations, instead of being distributed 
normally, take only the values 1 or 2, so that the differences within a 
pair are either +1 or - 1 . The same test is often used with continuous 
or discrete data, either because the investigator wishes to avoid the as- 
sumption of normality or as a quick substitute for the /-test. The pro- 
cedure is known as the sign test (8), because the differences betw een I he 
members of a pair are replaced by their signs ( + or - ), the si/e of the 
difference being ignored. In the formula for % 2 y a and b are the numbers 
of 4- and - signs, respectively. Any zero difference is omitted from the 
test, so that n = a + b. 

When the sign test is applied to a variate X that has a continuous oi 
discrete distribution, the null hypothesis is that X has the same distribu- 
tion under the two treatments. But the null hypothesis does not need to 
specify the shape of this distribution. In the /-test, on the other hand, the 
null hypothesis assumes normality and specifies that the parameter 
p (the mean) is equal for the two treatments. For this reason the /-test 
is sometimes called a parametric test, while the sign test is called rum- 
parametric. Similarly, the median and other order statistics are non- 
parametric estimates, -since they estimate percentiles of any continuous 
distribution without our requiring to define the shape of the distribution 
specifically by means of parameters 

In sampling from normal distributions the efficiency of the sign test 
relative to the /-test is about 65%. This statement implies that if the null 
hypothesis is false, so that the means of the two populations differ by an 
amount <5, a sign test based on 18 pairs and a /-test based on 12 pairs have 
about the same probability of detecting this by finding a significant dif- 
ference. The sign test saves time at the expense of a loss of sensitruty in 
the test. 

For numbers of pairs up to 20, table A 8 (p 554), intended for quick 
reference, shows the smaller number of like signs required for significance 



128 Chapter 5: Shortcut and Non-parametric Methods 

at the 1%, 5%, and 10% levels. For instance, with 18 pairs, we must have 
4 or less of one sign and 14 or more of the other sign in order to attain 5% 
significance. This table was computed not from the x 2 approximation 
but from the exact binomial distribution. Since this distribution is dis- 
continuous, we cannot find sample results that lie precisely at the 5% level. 
The significance probabilities, which are often substantially lower than 
the nominal significance levels, are shown in parentheses, in table A 8. 
The finding of 4 negative and 14 positive signs out of 18 represents a 
significance probability of 0.031 instead of the nominal 0.05. For 4 one- 
tailed tests these probabilities should be halved. 

EXAMPLE 5.4.1 — On being presented with a choice between two sweets, differing 
in color but otherwise identical, 15 out of 20 children chose color B. Test whether this is 
evidence of a general preference for B (i) by % 2 , (ii) by reference to table A 8. Do the results 
agree? 

EXAMPLE 5.4.2 — Two ice creams were made with different flavors but otherwise 
similar. A panel of 6 expert dairy industry men all ranked flavor A as preferred. Is this 
statistical evidence that the consuming public will prefer A? 

EXAMPLE 5.4.3 — To illustrate the difference between the sign test and the /-test in 
extreme situations, consider the two samples, each of 9 pairs, in which the actual differences 
are as follows. Sample I: - 1, 1, 2, 3, 4, 4, 6, 7, 10. Sample II: 1, 1, 2, 3, 4, 4, 6, 7, -10. 
In both samples the sign test indicates significance at the 5% level, with P = 0.039 from 
table A 8. In sample I, in which the negative sign occurs for the smallest difference, we 
find t - 3.618, with 8 d.f., the significance probability being 0.007, In sample II, where the 
largest difference is the one with the negative sign, t - 1.125, with P = 0.294. Verify that 
Lord’s test shows t w = 0.364 for sample I and 0.118 for sample II, and gives verdicts in 
good agreement with the /-test. When the aberrant signs represent extreme observations 
the sign test and the /-test do not agree well. This does not necessarily mean that the sign 
test is at fault; if the extreme observation were caused by an undetected gross error, the 
verdict of the /-test might be misleading. 

5.5 — Non-parametric methods: ranking of differences between mea- 
surements. The signed rank test, due to Wilcoxon (2), is another sub- 
stitute for the /-test in paired samples. First, the absolute values of the 
differences (ignoring signs) are ranked, the smallest difference being as- 
signed rank 1 . Then the signs are restored to the rankings. The method is 
illustrated from an experiment by Collins et al. (9). One member of a pair 
of corn seedlings was treated by a small electric current, the other being 
untreated. After a period of growth, the differences in elongation 
(treated-untreated) are shown for each of ten pairs. 

In table 5.5.1 the ranks with negative signs total 15 and those with 
positive signs total 40. The test criterion is the smaller of these totals, in 
this case, 15. The ranks with the less frequent sign will usually, though 
not always, give the smaller rank total. This number, sign ignored, is 
referred to table A 9. For 10 pairs a rank sum < 8 is required for re- 
jection at the 5% level. Since 1 5 > 8, the data support the null hypothesis 
that elongation was unaffected by the electric current treatment. 

The null hypothesis in this test is that the frequency distribution of 
the original measurements is the same for the treated and untreated mem- 



TABLE 5,5.1 

Example of Wilcoxon's Signed Rank Test 
(Differences in elongation of treated and untreated seedlings) 


Pair 

Difference (mm.) 

Signed Rank 

1 

6.0 

5 

2 

1.3 

1 

3 

10.2 

7 

4 

23.9 

10 

5 

i 3.1 

3 

6 

6.8 

6 

7 

- 1.5 

— 2 

8 

- 14.7 

— 9 

9 

- 3.3 

— 4 

10 

IU 

8 


bers of a pair, but as in the sign test the shape of this frequency distribu- 
tion need not be specified. A consequence of this null hypothesis is that 
each rank is equally likely to have a + or a - sign. The frequency dis- 
tribution of the smaller rank sum was worked out by the rules of prob- 
ability as described by Wilcoxon (2). Since this distribution is discon- 
tinuous, the significance probabilities for the entries in table A 9 are not 
exactly 5% and 1%, but are close enough for practical purposes. 

If the two or more differences are equal, it is often sufficiently accurate 
to assign to each of the ties the average of the ranks that would be assigned 
to the group. Thus, if two differences are tied in the fifth and sixth posi- 
tions, assign rank 5 1/2 to each of them. 

If the number of pairs n exceeds 16, calculate the approximate normal 
deviate 

where T is the smaller rank sum, and 

jt — n(n + l)/4 : or = <f(2n + l)/x/6 

The number - 1/2 is a correction for continuity. As usual, Z > 1 .96 sig- 
nifies rejection at the 5% level. 

EXAMPLE 5.5.1 From two J-shaped populations distributed like chi-square with 
d.f. - 1 (figure 1.13.1), two samples of n - 1 0 were drawn and paired at random : 


Sample 1 

1.98 

3.30 

5.91 

1.05 

1.01 

1.44 

3,42 

2.17 

1.37 

1.13 

Sample 2 

0.33' 

0.11 

0.04 

0.24 

1.56 

0.42 

0.00 

0.22 

0.82 

2.54 

Difference 

1.65 

3.19 

5.87 

0.81 

-0.55 

1.02 

3.42 

1.95 

0.55 

-1.41 

Rank 

6 

8 

10 

3 

-1.5 

4 

9 

7 

1,5 

-5 


The difference between the population means was 1. Apply the signed rank test. Ans. 
The smallest two absolute differences are tied, so each is assigned the rank (1 + 2)/2 =* 1,5 
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The sum of the negative ranks is 6,5, between the critical sums, 3. and 8, in table A 9, H 0 
is rejected with P = 0.04, approximately. 

EXAMPLE 5.5.2 — If you had not known that the differences in the foregoing example 
were from a non-normal population, you would doubtless have applied the /-test. Would 
you have drawn any different conclusions? Ans. t — 2.48, P = 0.04. 

EXAMPLE 5.5.3 — Apply the signed rank test to samples I and II of example 5.4.3. 
Verify that the results agree with those given by the /-test and not with those given by the 
sign test. Is this what you would expect ? 

EXAMPLE 5.5.4 — For 16 pairs, table A 9 states that the 5% level of the smaller rank 
sum is 29, the exact probability being 0.053. Check the normal approximation in this case 
by showing that pi = 68, a = 19.34, so that for T= 29 the value of Z is 1.99, corresponding to 
a significance probability of 0.047. 

5.6 — Non-parametric methods: ranking for unpaired measurements. 
Turning now to the two-sample problems of chapter 4, we consider rank- 
ing as a non-parametric method for random samples of measurements 
which do not conform to the usual models. This test was also developed 
by Wilcoxon (2), though it is sometimes called the Mann- Whitney test 
(11). A table due to White (12) applies to unequal group sizes as well as 
equal. All observations in both groups are put into a single array, care 
being taken to tag the numbers of each group so that they can be dis- 
tinguished. Ranks are then assigned to the combined array Finally, the 
smaller sum of ranks, T, is referred to table A 10 to determine signifi- 
cance. Note that small values of T cause rejection. 

An example is drawn from the Corn Borer project in Boone County, 
Iowa. It is well established that, in an attacked field, more eggs are de- 
posited on tall plants than on short ones. For illustration we took records 
of numbers of eggs found in 20 plants in a rather uniform field. The 
plants were in 2 randomly selected sites, 10 plants each. Table 5.6.1 con- 
tains the egg counts. 


TABLE 5.6.1 

Number of Corn borer Eggs on Corn Plants, Boone County, Iowa, 1950 


Height of Plant 

Number of Eggs 

■ 

Less than 23" 

0 

14 

18 

0 31 0 

0 0 

11 

0 

More than 23" 

37 

42 

12 

32 105 84 

15 47 

51 

65 


In years such as 1950 the frequency distribution of number of eggs 
tends to be J -shaped rather than normal. At the low end, many plants 
have no eggs, but there is also a group of heavily infested plants. Normal 
theory cannot be relied upon to yield correct inferences from small 
samples. 

For convenience in assigning ranks, the counts were rearranged in 
increasing order (table 5.6.2). The counts for the tall plants are in bold- 



TABLE 5.6.2 

Egg Counts Arranged in Increasing Order, with Ranks 
(E foldface type indicates counts on plants 23" or More) 


Count 0, 0, 0, 0, 0, 0, 11, 12, 14, 15, 18, 31 

Rank 3$, 3$, 3§, 3$, 3£, 3|, 7, 8, 9, 1®, II, . 12 

face type. The eight highest counts are omitted, since they are all On tall 
plants and it is clear that the small plants give the smaller rank sum. 

By the rule suggested for tied ranks, the six ties are given the rank 
3£, this being the average of the numbers 1 to 6. In this instance the aver- 
age is not necessary, since all the tied ranks belong to one group; the sum 
of the six ranks, 21 , is all that we need. But if the tied counts were in both 
groups, averaging would be required. 

The next step is to add the rank numbers in the group (plants less 
than 23 in.) that has the smaller sum. 

7 = 21+7 + 9 + 11 + 12 = 60 

This sum is referred to table A 10 with n, = n 2 = 10. Since 7 is less than 
7 0 . 01 = 7 1 , the null hypothesis is rejected with P < 0.01 . The anticipated 
conclusion is that plant height affects the number of eggs deposited. 

When the samples are of unequal sizes n u n 2 , an extra step is required. 
First, find the total 7, of the ranks for the sample that has the smaller 
size, say n,. Compute T 2 = n l (n 1 + n 2 + 1) - T t . Then J, which is re- 
ferred to table A 10, is the smaller of 7, and T 2 . To illustrate, White 
quotes Wright’s data (10) on the survival times, under anoxic conditions, 
of the peroneal nerves of 4 cats and 1 4 rabbits. For the cats, the times were 
25, 33, 43, and 45 minutes: for the rabbits, 15, 16, 16, 17, 20, 22, 22, 23, 
28, 28, 30, 30, 35, and 35 minutes. The ranks for the cats are 9, 14, 17, 
and 18, giving 7, = 58. Hence, T 2 = 4(19) - 58 = 18, and is smaller 
than T u so that 7=18. For n 2 = 4, n 2 = 14, the 5% level of 7is 19. The 
mean survival time of the nerves is significantly higher for the cats than 
for the rabbits. 

For values of n x and n 2 outside the limits of the table, calculate 
Z — (\n — T\ — i)/<r, 

where 


H = 11 ,( 11 ! + n 2 + l)/2» : a = 

The approximate normal deviate Z is referred to the tables of the normal 
distribution to give the significance probability P. 

Table A 10 was calculated from the assumption that if the null 
hypothesis is true, the n x ranks in the smaller sample are a random selec- 
tion from the (n, + n 2 ) ranks in the combined samples. 
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5.7 — Comparison of rank and normal tests. When the /-test is used 
on non-normal data, two things happen. The significance probabilities 
are changed; the probability that t exceeds t 0 05 when the null hypothesis 
is true is no longer 0.50, but may be, say, 0.041 or 0.097. Secondly, the 
sensitivity or power of the test in finding a significant result when the null 
hypothesis is false is altered. Much of the work on non-parametric 
methods is motivated by a desire to find tests whose significance proba- 
bilities do not change and whose sensitivity relative to competing tests 
remains high when the data are non-normal. 

With the rank tests, the significance levels remain the same for any 
continuous distribution, except that they are affected to some extent by 
ties, and by zeros in the signed rank test. In large normal samples, the 
rank tests have an efficiency of about 95% relative to the r-test (13), and 
in small normal samples, the signed rank test has been shown (14) to have 
an efficiency slightly higher than this. With non-normal data from a 
continuous distribution, the efficiency of the rank tests relative to / never 
'falls below 86% in large samples and may be much greater than 100% for 
distributions that have long tails (13). Since they are relatively quickly 
made, the rank tests are highly useful for the investigator who is doubtful 
whether his data can be regarded as normal. 

The beginner may wish to compute both the rank tests and the /-test 
for some of his data to see how they compare. Needless to say, the prac- 
tice of quoting the test that agrees with one’s predilections vitiates the 
whole technique. 

As has been stated previously, most investigations, after the prelimi- 
nary stages, are designed to estimate the sizes of differences rather than 
simply to test null hypotheses. The rank methods can furnish estimates 
and confidence limits for the difference between two treatments (see 
examples 5.8.1 and 5.8.2). The calculations require no assumption of 
normality, but are a little tedious. Some work has also been done in ex- 
tending rank methods to the more complex types of data that we shall meet 
in later chapters, though the available techniques still fall short of the 
flexibility of the standard methods based on normality. 

5.8 — Scales with limited values. In some lines of work the scales of 
measurement are restricted to a small number of values, perhaps to 0, 1, 
2 or 1 , 2, 3, 4, 5. Investigators are sometimes puzzled as to how to test the 
differences between two treatments in this case, because the data do not 
look normal, while rank methods usually involve a substantial number 
of zeros and ties. We suggest that the ordinary /-test be used, with the in- 
clusion of a correction for continuity. To illustrate, consider a paired 
test in which the original data are on a 0, 1, 2 scale. The differences be- 
tween the members of a pair can then assume only the values 2, 1,0, — 1, 
and —2. 

With 12 pairs, suppose that the differences D between two treatments 
A and B are 2, 2, 2, 1, 1, 1, 0, 0, 0, 0, - I, - 1. Then SD = 7 There is a 
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test, called Fisher’s randomization test (15), that requires no assumption 
about the form of* the basic distribution of these differences. The argu- 
ment used is that if there is no difference between A and B, each of the 
12 differences is equally likely 4 to be + or Thus, under the null 
hypothesis there are 2 12 = 4,096 possible sets of sample results. Since, 
however, +0 and -0 are the same, only 2 8 = 256 need be examined. 
We then count how many samples have ZD as great as or greater than 7, 
the observed ZD, It is not hard to verify that 38 samples are of this kind 
if both positive and negative totals are counted so as to provide a two- 
tailed test. The significance probability is 38/256 = 0,148. The null 
hypothesis is not rejected by the randomization test. 

With this test the investigator must work out his own significance 
probability. From his writings it seems clear that Fisher did not intend 
the test for routine use, but merely to illustrate that a test can be made 
if A and B were assigned to the members of each pair by randomization. 

For scales with limited numbers of values, numerous comparisons of 
the results of this test and the r-test show that they usually agree well 
enough for practical purposes. In the randomization test, however, the 
possible values of ZD jump by 2’s. Our observed ZD is 7. We would 
have ZD = 9 if only one 1 had a - sign, and ZD = 5 if three Fs had a - 
sign. To apply the correction for continuity, we compute t c as 


6 

nsfi (12)(0.3I3) 


1.597, 


where % = 0.313 is computed in the usual way. With 1 1 <//., P is 0,138, 
in good agreement with the randomizatidh test. The denominator of t c 
is the standard error of ZD. This may be computed either as ns& or as 

vW 

In applying the correction for continuity, the rule is tp find the next 
highest value of ZD that the randomization set provides. The numerator 
of / c is halfway between this value and the observed ZD. The values of 
ZD do not always jump by 2’s. 

With two independent samples of size n the randomization test 
assumes that on the null hypothesis the (2 n) observations have been 
divided at random into two samples of n. There are (2 n)l/(n\ ) 2 cases. 
To apply the correction, find the next highest value of ZD t - ZD 2 . If 
one sample has the values 2, 3, 3, 3 and the other has 0, 0, 0, 2, we have 
ZD X = 11,ZZ) 2 = 2, giving ZD { - ZD 2 = 9. The next highest value is 7, 
given by the case 2, 2, 3, 3 and 0, 0, 0, 3. Hence, the numerator of t c is 8 
The general formula for t c is 


sz> 2 | - 

c yjinsi 1 + m 2 2 ) 
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with 2 (n — 1) d.f, where s x 2 and s 2 2 are the sample variances and c is the 
size of the correction. 

With small samples that show little overlap, as in this example, the 
randomization test is easily calculated and is recommended, because in 
such cases t c tends to give too many significant results. With sample 
values of 2, 3, 3, 3 and 0, 0, 0, 2, the observed result is the most extreme 
of the 8!/(4!) 2 cases. The randomization provides 4 cases like the ob- 
served one in a two-tailed test. P is therefore 4/70 = 0.057. The reader 
may verify that t c = 3.58, with 6 d.f. and P near 0.01 . 

EXAMPLE 5 8 1— -In Wnght’s data, p. 131, show that if the survival time for each 
cat is reduced by 2 minutes, the value of T m the signed rank test becomes 18 1/2, while if 
the cat times are reduced by 3 minutes, T - 21 Show further that if 23 minutes are sub- 
tracted from each cat, we find T - 20 1/2, while for 24 minutes, T = 19. Since T 0 05 = 19, 
any hypothesis which states that the average survival time of cats exceeds that of rabbits 
by a figure between 3 and 23 minutes is accepted m a 5% test The limits 3 and 23 minutes 
are 95% confidence limits as found from the rank sum test 

EXAMPLE 5 8 2 — In a two-sample comparison, the estimate of the difference between 
the two populations appropriate to the use of ranks is the median of the differences X t — Y r 
where X x and Y } denote members of the first and second samples In Wnght’s data, with 
«! = 4, n 2 = 14, there are 56 differences Slow that the median is 12 5. (You should be 
able to shortcut the work ) 

EXAMPLE 5 8 3 — In a paired two-sample test the ten values of the differences D were 
3, 3, 2, 1, 1, 1, 1, 0, 0, - 1 Show that the randomization test gives P - 3/64 = 0 047 while 
the value of t, corrected for continuity, is 2 457, corresponding to a P value of about 0 036 
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★ CHAPTER SIX 


R 

A regression 


6.1 — Introduction. In preceding chapters the problems considered 
have involved only a single measurement on each individual In this 
chapter, attention is centered on the dependence of one variable Y on 
another variable X. In mathematSSTiscalleda function of J, but in 
statistics the term regression is generally used to describe the relationship. 
The growth curve of height is spoken of as the regression of height on age ; 
in toxicology the lethal effects of a drug are described by the regression of 
per cent kill on the amount of the drug. The origm of the term regression 
will be explained in section 6.16. To distinguish the two variables in 
regression studies, Y is sometimes called th^ lependent a nd X the inde - 
pendent variabl e. These names are fairly appropriate in the toxicology 
example, inwhich we can think of the per cent kill Y as being caused by 
the amount of drug X , the amount itself being variable at the will of the 
investigator. They are less suitable though still used, for example, when 
Y is the weight of a man and X is his maximum girth. 

Regression has many uses. Perhaps the objective is only to learn if Y 
does depend on X. Or, prediction of Y from X may be the goal. Some 
wish to determine the shape of the regression curve. Others are con- 
cerned with the error in Y in an experiment after adjustments have been 
made for the effect of a related variable X. An investigator has a theory 
about cause and effect, and employs regression to test this theory. To 
satisfy these various needs an extensive account of regression methods is 
necessary. 

In the next two sections the calculations required m fitting a regres- 
sion are introduced by a numerical example. The theoretical basis of these 
calculations and the useful applications of regression are taken up in sub- 
sequent sections. 

6.2.— The regression of blood pressure on age. A project ‘‘The Nutri- 
tional Status of Population Groups 1 ’ was set up by the Agricultural 
Experiment Stations of nine midwestern states. From the facts learned we 
have extracted data on systolic blood pressure among 58 women over 30 
yeais of age, a random sample from a region near Ames, Iowa (1). For 

135 
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present purposes, the ages are grouped into 10-year classes and the mean 
blood pressure calculated for each class. The results are m the first two 
columns of table 6.2.1. 


TABLE 6 2 1 

Mean Systolic Blood Pressure of 58 Women in 10- Year Age Classes 


Midpoint of 

Mean Blood 

Deviations From 




Age Class 

Pressure 

Means 

Squares 

Products 

X 

Y 

X 

y 

x 1 


xy 

35 

114 

-20 

-27 

‘ 400 

729 

540 

45 

124 

-10 

-17 

100 

289 

170 

v-55 ; 

143 

0 

2 

0 

4 

0 

65 1 

158 

10 

17 

100 

289 

170 

75 

166 

! 

20 

25 

400 

625 

500 

Sum 275 

70‘S' 

0 

0 

1,000 

1,936 

1,380 

Mean 55 i 

141 ✓ 



t . 




Sample regression coefficient b — 


lx) 


1,380 

1,000 


1 38 units of blood pressure »r \ear 


As m most regression problems, the first thing to do is to draw a graph, 
figure 6.2.1. The independent variable X is plotted along the horizontal 
axis. Each measure of the dependent Y is indicated by a black circle 
above the corresponding X. Clearly, the trend of blood pressure with age 
is upward and roughly linear. 

The straight line drawn m the figure is the sample regression of Y on X. 
Its position is fixed by two results : _ _ 

(i) It passes through the point 0'(X, Y), the point determined by the 
mean of each sample. For the blood pressures this is the point (55, 141). 

(n) Its slope is at the rate of b units of Y per unit of X, where b_ is 
the sample regression coefficient. Writing x ~ X — X and y = Y — Y, 
b == Exy/Tx ? . The numerator of b is a new quantity — the sum of products 
of the deviations, x and y. In table 6.2.1 the individual values of x 2 have 
been obtained in the fifth column and those of xy in the seventh column. 
In section 6 3a quicker method of calculating b will be given. For the 
blood pressures, b = -f 1.38, meaning that blood pressure increases on 
the average by 1.38 units per year of age. 

The sample regression equation of Y on X is now written as 

?= Y+ bx,l 


where Yjs th e, estimated value and v the estimated deviation of Y co r- 
responding to anv a - deviation If v = 20 years, y = (1 38) (20) = 27 6 
units of blood pressure 




Y 



Fig 6 2 1- Sample regression of blood pressure on age The broken lines indicate omis- 
sion of the lower parts ol the scales m order to clarity the relations in the parts occupied 

by the data 


This equation enables us to complete figure 6 2.1 by drawing the 
sample regression line. Lay off O'M = 20 years to the right of 0\ then 
erect a perpendicular, MP = 27.6 units of blood pressure. The line O P 
then has the slope, 1 .38 units of blood pressure per year. 

In terms of the original units, the sample regression equation is 

For the blood pressures, this becomes 

f - 141 - 1.38 (JT- 55) 
or 

f = 141 + 1,38 (X ~ 55) 

= 65.i-fl.38X 

If X = 75 is entered in this equation, f becomes 65 1 + (1 38)(75) - 168 6 
units of blood pressure The corresponding point, (75, 168 6), is shown 

as P m the figure. ^ 

We can now compare the sample points with the corresponding V to 
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get measures of the goodness of fit of the line to the data. Each X is sub- 
stituted in the regression equation and f calculated. The five results are 
recorded in table 6.2.2. The deviations from regression , Y — Y = d rx , 
measure the failure of the line to fit the data. In this sample, 45-year-old 
women had below average blood pressure and 65-year-olds had an excess. 


TABLE 6.2.2 

Calculation of f and Deviations From Regression, d y . x = Y - ? 
(Blood pressure data) 


Midpoint of 
Age Class 

X 

Mean Blood 
Pressure 

Y 

Estimated Blood 
Pressure 

f 

Deviation From 
Regression 

Y-t = d rx 

Square of 
Deviation 
d 2 

u yx 

35 

114 

1134 

0.6 

0.36 

45 

124 

127.2 

-3.2 

10.24 

55 

143 

141.0 

2.0 

4.00 

65 

158 

154.8 

3.2 

10.24 

75 

166 

168.6 

-2.6 

6 76 

Sum 




x 2 = 31.60 


The sum of squares of deviations, Irf r:c 2 = 31.60, is the basis for an 
estimate of error in fitting the line. The corresponding degrees of freedom 
are n — 2 = 3. We have then, 

s r * 2 - ^d yx 2 /(n - 2) = 10.53, 

where s yx 2 is the mean square deviation from regression. The resulting 
sample standard deviation from regression , 

s yx = yjs rx 2 = 3.24 units of blood pressure, 

corresponds to s in single-variable problems. In particular, it furnishes a 
sample standard deviation of the regression coefficient , 

h = SyJy/lX 2 

This is 3.24/^/1,000 = 0.102 units of blood pressure, with (« - 2) = 3 df. 
A test of significance of b is given by 

t = b/s hi df = n - 2 

Applying this to the blood pressures, 

t = 1.38/0.102 = 13.5**, df = 3 ' 

Note: It is often convenient to denote significance by asterisks. A single 
one indicates probabilities between 0.05 and 0.01' ; two indicate prob- 
abilities equal to or less than 0.01. 




m 


Often there is little interest in the individual d rx of table 6,2.2. If so, 
ld rx 2 may be calculated directly by the formula, 

Td rx 2 -[(Xxy) 2 IZx 2 ) 

Substituting the blood pressure data from table 6.2.1, 

£ d,. 2 = 1,936 — [(1,380)71,000] = 31.60 

as before. 

EXAMPLE 6.2.1 — Following are measurements on heights of soybean plants tn a 
field, a different random selection each week (2) 

Age m weeks 12 3 4 5 6 7 

Height m centimeters 5 13 16 23 33 38 40 

Venfy these results. X = 4 weeks, F= 24 cms., Xx 2 * 28, Xy 2 » 1.080, Xxy «• 1 72 Com- 
pute the sample regression, t « 6.143 X - 0,572 centimeters 

EXAMPLE 6.2.2— Plot on a graph the sample points for the soybean data, then con- 
struct the sample regression line. Do the points he about equally above and below the line 9 

EXAMPLE 6.2.3 — Calculate s b « 0.409 cms./wk Set the 95% confidence interval for 
the population regression. Ans. 5.09 - 7.20 cms./wk Note that \\m5d) 

EXAMPLE 6 2 4 — The soybean data constitute a growth curve. Do you suppose the 
population growth curve is really straight 7 How would you design an experiment to get a 
growth curve of the blood pressure in Iowa women 7 

EXAMPLE 6 2.5— Eighteen samples of soil were prepared with varying amounts of 
inorganic phosphorus, X. Corn plants, grown m each soil, were harvested at the end of 38 
days and analyzed for phosphorus content From this was estimated the plant-available 
phosphorus in the soil. Nme of the observations, adapted for ease of computation, are 
shown m this table. 

Inorganic phosphorus in soil (ppm), X j 1 4 * 9 13 11 23 23 28 

Estimated plant-available phosphorus (ppm), Y j 64 71 54 81 93 7 6 77 95 109 

Calculate b « 1 417, s k * 0 395. t - 3 59** 

63 — Shortcut methods of computation in regression. Since regres- 
sion computations are tedious, a calculating machine is almost essential. 
In fitting a ^egression, the following six basic quantities must be obtained: 

n, X, Y, Zxy 

You already know shortcut methods of computing Xa 2 and Xy 2 without 
finding the individual deviations jc and y. A similar method exists for 
finding Xxy, based on the algebraic identity 

Xxy = X(X - X)( Y - 7) = XXY - (XX)(X Y )/n 
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Note that the correction term may be larger than EAT, making Exp nega- 
tive. This indicates a downward sloping regression line 

In table 6.3.1 the regression of blood pressure on age has been 
recomputed using these shortcuts. 


TABLE 6 3 I 

Machine Computation of a Linear Regression 


Age (years), X 

Blood pressure (units), Y 

! 35 

j 114 

45 

124 

55 65 75 

143 158 166 

XA = 275 

xr = 

705 

n — 5 

j? = 55 

?= 

141 


XX 2 = 16,125 

XY 2 = 

101,341 

YXY = 40 155 

(XX) 2 /n =15,125 

(X Y) 2 ,n = 

99,405 

(XA)(X Y)/n = 38,775 

Zx 2 = 1,000 

Xv 2 - 

1,936 

X 1,380 


h » I xi /2.x 2 - 1,380/1,000 - 1 38 units per year of age 
? = Y + b(\ - \ ) 

= 141 + 1 38(JT — 55) = 65 1 + 1 WX 
Zdy x 2 = Xi 2 - (Zx\) 2 /Zx 2 = 1,936 - (1,380) 2 /1 000 « 31 60 
s, * 2 - ld } x 2 /(n - 2) = 31 60/3 = 10 53 
s y x = y / 10 53 «= 3 245 units 
s h * ^ * 245/^1 000 = 0 102 

t = b/s b = 1 38/0 102 = 13 5**, df=n- 2-3 


The figures shown under the sample data are all that need be written 
down In most calculating machines, EAT and EAT 2 can be accumulated 
in a single run, E Y and E F 2 m a second run and EAT in a third, without 
writing down any intermediate figures With small samples m which X 
and Y have no more than three significant figures, some machines will 
accumulate EAT, EK, EAf 2 , 2 EAT, and E Y 2 m one run 

EXAMPLE 6 3 1 — The data show the initial weights and gains m weight (grams) of 15 
female rats on a high protein diet from the 24th to 84th day of age The point of interest 
m these data is whether the gam m weight depends to some extent on initial weight If so, 
feeding experiments on female rats can be made more precise by taking account of the 
initial weights of the rats, either by pairing on initial weight or by adjusting for differences 
m initial weight m the analysis Calculate b by the shortcut method and test its significance 
Ans b s= 1 0641 t = bjs h — 2 02 with 13 df not quite significant at the 5% level 


Rat Number 


1 2 3 4 5 6 ,7 8 9 10 11 12 13 14 15 


Initial weight X | 50 64 76 64 74 60 69 68 56 48 57 59 46 45 65 

Gam, Y 


128 159 158 119 133 112 96 126 132 118 107 106 82 103 104 
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EXAMPLE 6 3 2 -Speed records attained in the Indianapolis Memonal Day auto- 
mobile races, 191 1-1941, are as follows in miles per hour 


Year 

X 

Speed 



Speed 



Speed 

Y 

Year 

X 

Y 

Year 

A 

Y 

1911 

0 

74 6 

1922 

11 

94 5 

1932 

21 

104 1 

1912 

1 

78 7 

1923 

12 

910 

1933 

22 

1042 

1913 

2 

75 9 

1924 

13 

98 2 

1934 

23 

1049 

1914 

3 

82 5 

1925 

14 

101 1 

; 1935 

24 

106 2 

1915 

4 

89 8 

1926 

15 

959 

' 1916 

25 

109 1 

1916 

5 

83 3 

1927 

16 

97 5 

1937 

26 

IMG 

1917 

6 

* 

1928 

17 

99 5 

1938 

27 

1172 

1918 

7 

* 

1929 

18 

97 6 

j 1939 

28 

115 0 

1919 

8 

88 1 

1930 

19 

1004 

1940 

29 

114 3 

1920 

9 

88 6 

1931 

20 

96 6 

! 1941 

30 

115 1 

1921 

10 

89 6 




1 

j - 




* No races 

The years have been coded by subtracting 1911 from each Calculate Zx 2 » 2 323 02 
Zy 2 = 4,039 81, Zx\ = 2,971 23, f * 1 278 J + 77 57 miles per hour 

6.4 — The mathematical model in linear regression. In standard linear 
regression, three assumptions are made about the relation between Y and 
X 

1 For each selected X there is a normal distribution of Y from which 
the sample value of Y is drawn at random If desired more than 
one Y may be drawn from each distribution 

2 The population of values of Y corresponding to a selected X has a 
mean p that lies on the straight line /i = (k + fi(X ~~ X ) » a 4 fix, 
where a and /? are parameters (to be explained presently). 

3 In each population the standard deviation of Y about its mean 
(x + fix has the same value, often denoted by a y x 

The mathematical model is specified concisely by the equation 

Y =5 a 4* fix + f, 

where s is a random variable drawn from 4 (0, <x^ K ) 

In this model, Y is the sum of a random part, ( , and a part fixed by x 
The fixed part, according to assumption number 2 above, determines the 
means of the populations sampled, one mean for each \ These means 
lie on the straight line represented by p = a 4- fix, the population regres- 
sion line The parameter a is the mean of the population that corresponds 
to x = 0 , thus a specifies the height of the line w hen X - X ft is the slope 
of the regression line, the chanqt m Y per unit increase in \ As for the 
variable part of Y, i is drawn at random from i (0 a\ x ), it is independent 
of \ and normally distributed, as the symbol 4 signifies 
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yA 



Fig 6.4 1 — Representation of the linear regression model The normal distribution 
of Y about the regression line a + /be is shown for four selected values of X 


Figure 6.4.1 gives a schematic representation of these populations. 
For each of four selected values of X the normal distribution of Y about 
its mean p, = a *f fix is sketched. These normal distributions would all 
coincide if their means were superimposed. 

For non-mathematicians, the model is best explained by an arith- 
metical construction. Assign to X the values 0, 2, 3, 7, 8, 10, as in table 
6.4.1. This is done quite arbitrarily; the manner in which X is fixed has 
no bearing on the illustration. 

Next, calculate X and the deviations, x = X - X, in column 2. 

Now take jS = 0.5 ; this implies that the means of the populations are 
to increase one-half umt with each unit change in x. From this, column 3 
is calculated. 

Choose a = 4, meaning that at x = 0 the population regression is 4 
units above the Jf-axis. 

The fixed X together with a and j? determine the succession of means 
m column 4. These are indicated by open circles on the population regres- 
sion line (the dotted line) of figure 6.4.2. So far all quantities are fixed , 
without sampling variation. 

Coming now to the variable part of 7, the £ are drawn at random 
from a table of random normal deviates with mean zero a rx = 1. The 
values which we obtained were 1.1, —1.3, -1.1, 1.0, 0, and -1.0, as 
shown in column 5 of table 6.4.1 . Column 6 contains the sample values of 
F, each item being the sum of the fixed part in column 4 and the cor- 



TABLE 6.4.1 

Construction of a Sample From Y « at + px + e, With 0.5, 

and 6 Drawn From .^(0, 1) 


m 


X 

X 

jSx = 0.5x 

a 4 f x = 4 + 0.5.x | 

i 

£ 

T * a + fix + e 

(1) 

(2) 

(3) 

(4) 

i 

(6) 

0 

— 5 i 

—2.5 

1.5 

l.i 

2.6 

2 

~ 3 i 

-1.5 

2,5 

— 1.3 

1.2 

3 

-2 ! 

-1.0 

3.0 

-U 

1.9 

7 

2 1 

1.0 

5.0 

1.0 

, 6.0 

8 

3 

1.5 

5.5 

0.0 

! 5.S 

10 

5 j 

2.5 

6.5 

-1.0 1 

5.5 


Calculations of estimates for sample regression, Y on X 


XX'= 30 


XK = 22.7 

X = 5 


7 = 3 78 

XA 2 = 226 

uy - 149 1 

XY 2 - 108.31 

(XX) 2 /n = 150 

EJifXy/n- 113.5 

(XY) 2 /n** 85.88 

Xx 2 = 76 

£*>>« 35.6 

Z/» 22.43 


b = Exy/Ex 2 « 35.6/76 = 0.468 
t = 3.78 + 0 468 {X - 5) - 1.44 + 0 . 468 ^r 
Id, x 2 = 2/ - (Ixy) 2 /Zx 2 - 22.43 - (35.6) 2 /7 6 « 5 75 
5 rx 2 = Y,d y . x 2 /(n - 2) = 5.75/4 - 1 44 $ rat - V T 44 = 1.20 


responding random part in column 5. The sample points are plotted in 
black circles in the figure. £ 

The calculations of Y and b are given under table 6.4. 1 . The popula- 



Fig 6 42 -Population i egression /z - 4 i 0 Sample regression ) — 3 78 + 0 468.x 
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tion value a = 4 is estimated_by 7 = 3.78. The sample regression line 
passes through the point ( X , F), (5, 3.78). The slope /? = 0.5 is estimated 
by b — 0.468. The solid line in figure 6.4.2 is the sample regression line. 
It is nearly parallel to the population line but lies below it because of the 
underestimation of a. The discrepancies between the two lines are due 
wholly to the random sampling of the e. 

EXAMPLE 6 . 4 . 1 — In table 6 . 4 . 1 , b - 0 . 468 . Calculate the six deviations from regres- 
sion, d rx , and identify each with the distance of the corresponding point from the sample 
regression line. The sum of the deviations should be zero and the sum of their squares 
about 5.75. 

EXAMPLE 6 4 . 2 — Construct a sample with a = 6 and /?=—!. The negative /? means 
that the regression will slope downwards to the right. Take X = 1, 2, . . . 9, X being 5. By 
using table 3 2 1, draw e randomly from >'(0, 5). Make a table showing the calculation of 
the sample of Y. Graph the population regression and the sample points. Save your work 
for further use 

6.5 — Y as an estimator of fx = a + For any x, the computed value 
F estimates the corresponding tx = a + /?x._For example, we have already 
seen that at x = 0 (for which X = 5), t 5 = Y estimates ju 5 = a. As another 
example, at x = 2, for which X = 7, % = 1.44 + (0.468)(7) = 4.72, esti- 
mates \l = 4 + (0.5)(2) = 5. 

More generally, 

f - fi = (F - a) + (b - p)x (6.5.1) 

Thus, the difference between f and the corresponding fx has two sources, 
both due to the random e. One is the difference between the elevations 
of the sample and population regression lines (7 — a); the other, the dif- 
ference between the two slopes ( h - /?). 

Estimates of fi are often made at an X lying between two of the fixed 
X whose Y were sampled. For example, at X = 4, 

y 4 = 1.44 + (0.468X4) = 3.31, 

locating a point on the sample regression line perpendicularly above 
X = 4. Here we are estimating ^ in a population not sampled. There is 
no sample evidence for such an estimate; it is made on the cognizance of 
the investigator who has reason to believe that the intermediate popula- 
tion has a fx lying on the sampled regression, a + fix. 

Using the same argument, one may estimate pi at an X extrapolated 
beyond the range of the fixed X. Thus, at X = 12, 

y l2 = 1.44 4 (0.468)(12) - 7.06 

Extrapolation involves two extra hazards. Since x tends to be large 
for extrapolated values, equation 6.5.1 shows that the term (b - fi)x may 
make the difference ( Y - /z) large. Secondly (and this is usually the more 
serious hazard), the population regression of means may actually be 
curved to an extent that is small within the limits of the sample, but be- 
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comes pronounced when we move beyond these limits, so that results 
given by a straight-line regression are badly wrong. 

The value of Y also enables us to judge whether an individual ob- 
served Y is above or below its average value for the X in question. Look, 
for example, at the first point on the left of the graph (figure 6 . 4 . 2 ). 
T 0 = 2.6, to be compared with ? 0 = 1 .44. The positive deviation, 
dyo = T 0 — y 0 = 1.16, shows that Y 0 exceeds its estimated value by 1.16 
units. Algebraically, 

d rx = Y - Y = a + fix + e - (Y + bx) 

= (a - Y) + (jS - b)x + e 

Thus, Y - Y is, as would be expected, an estimate of the corresponding 
normal deviate e, but is affected also by the errors in 7 and h . In the con* 
structed example, e 0 = 1.1, so that for this point Y 0 - ? 0 = 1 .16 is close. 
In large samples, the errors in 7 and h become small, and the residual 

Y ~ Y is a good estimate of the corresponding e. 

This examination of deviations from a fitted regression is often useful. 
A doctor’s statement: 44 For a woman of your* age, your blood pressure is 
normal,” would imply that Y ~~ Y was zero, or near to it. A value of Y 
that was quite usual in a woman aged 65 might cause a doctor to prescribe 
treatment if it occurred in a woman aged 35, because for this woman 

Y - Y would be exceptionally high. 

EXAMPLE 6.5.1 -For your sample in example 6.4.2, calculate ? and 6, then plot the 
sample regression line on your graph. Calculate the deviations d rx and compare them 
with the corresponding £. It is a partial check on your accuracy if £d y . x « 0 

EXAMPLE 6.5.2 — Using the blood pressure data of section 6.2, estimate p at age 30 
years. Ans. 106.5 units. 

EXAMPLE 6.5.3 — Calculate Y A » Y - bx, called adjusted K for each age group m 
table 6.2.2. Verify your results by the sum, £ Y A * £ Y. Suggest several possible reasons 
for the differences among adjusted Y. 

6.6 — The estimator of a rx 2 . As noted earlier, the quantity 
•V* 2 = Zd r 2 j{n - 2) 

is an unbiased estimator of <r rx 2 , the variance of the as. One way of re- 
membering the divisor (n - 2) is to note that in fitting the line we have 
two disposable constants, a and /?, whose values we choose so to make the 
d y . x as small as possible. If there are only two points ( Y*, X \ ) and ( Y 2 , A' 2 Y 
the fitted line goes through both points exactly. The d y . x and their sum of 
squares are then zero, no matter how large the true a y . x is. in other words, 
there are no degrees of freedom remaining for estimating <r v . x 2 . 

In the constructed example (table 6.4.1), .s v . x 2 was found to be 1.44, 
with 4 d/., as an estimate of a r 2 — 1. This gives 1 .20 as the estimate of 

<T V . X =1. 

The estimated variance in the original sample of values of Y is 
s y 2 = 22.43/5 = 4.49. By utilization of the knowledge of A, this variance 
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is reduced to s rx 2 = 1.44. Tt is sometimes said that a fraction 
(4.49 - 1.4 )/4.49, or about 68% of the variation in Y is associated with 
the linear regression on X , the remaining 32% being independent of X 
This statement is useful when the objective is to understand why Y varies 
and it is fcnown that X is one of the causes of the variation in Y. 

The natufe of s y . x 2 is also made clearer by some algebra. For the 
zth member of the sample, 

c, = Y, - a - Px, ■ d yx , = Y, - Y - bx, = y, - bx, 

Write 

e, = y, - a - px, = Y, - Y - bx, + ( F - a) + (* - j8)x, 

= c Vi - bx i ) + ( Y “ X) + (b - fi)x t 

Square x>th sides and sum over the n values in the sample. On the right 
side there are three squared terms and three product terms. The squared 
terms give 

E(y, - bx,) 2 + I(F- a) 2 + E (b - pfx 2 

The factors ( F - a) 2 and ( b - ft) 2 are constant for all members of the 
sample and can be taken outside the X sign. This gives, for the squaied 
terms, 

E(V, - bx,) 2 + n( Y - a) 2 + {b - P) 2 2 x, 2 

Remai k y. the three cross-product terms all vanish when summed 
over the sample. For example, 

2E(_i, - bx,)(Y- a) = 2 (F - a)E(y, - bx,) = 0 
since Ei, = 0 and Ex, = 0. Further, 

2E(F - a )(/> - p)x, = 2 (F- oc)(b - p) Ex, = 0, 

2ECf, - bx,)(b - p)x, = 2 {b - p)2x,(y, - bx,) 

= 2 (h - 0)(Zjcj\ - blx 2 ) 

which \amshes since b = YxjJHx 2 . Thus, finally, 

Xa, 2 = £( T t - a - 0 a,) 2 = X( - F - bx,) 2 + n{ Y - a) 2 

+ (6 - /?) 2 Xa, 2 (6.6.1) 

Rearranging, 

X</ V . x 2 = X( T t - F - bx,) 2 = Xa t 2 - n(Y~ a) 2 - (6 - P) 2 Zx t 2 

On the right side of this equation, each e, has mean zero and variance 
(J yx 2 . Thus the term Xa, 2 is an estimate of no r 2 . The two subtracted 
terms on the right can be shown to be estimates of <x rx 2 . It follows that 
X d y x 2 is an unbiased estimate of (n — 2)a y . 2 , and on division by {n - 2) 
piovides an unbiased estimate of cr vx 2 . This result, namely that s x . x 2 is 
unbiased, does not require the a, to be normally distributed. Normality is 
required, however, to prove the standard tests of significance m regression. 
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6.7 — The method of least squares. The choice of F and b to estimate 
the parameters a and ft is an application of a principle widely used in 
problems of statistical estimation and known as the method of least 
squares. To explain this method, let & and ft denote any two estimators 
of a and ft that we might consider. For the pair of observations ( F, X) 
the quantity 

Y-oL-ftx 

measures the amount by which the fitted regression is in error in estimating 
Y . In the method of least squares, 6t and ft are chosen so as to minimize 
the sum of the squares of these errors, taken over the sample. That is, 
we minimize 

Y(Y-d-ftx) 2 (6.7.1) 

About 150 years ago the scientist Gauss showed that estimators ob- 
tained in this way are (i) unbiased, and (u) have the smallest standard 
errors of any unbiased estimators that are linear expressions in the Fs. 
Gauss’ proof does not require the Fs to be normally distributed, but 
merely that the e’s are independent with means zero and variances a y . 2 . 

The result that (6.7.1) is minimized by taking & = F and ft - b is 
easily verified by quoting a previous result (6.6. 1 , p. 1 46). Since the proof 
of the algebraic equality in (6.6.1) may be shown to hold for any pair of 
values a, ft, the equation remains valid if we replace a by & and ft by ft. 
Hence quoting (6.6.1), 

Z(F - & - ftxf = Z(F - F - bx) 2 + n(Y - d) 2 + (b - ft) 2 I.x 2 

The first term on the right is the sum of squares of the errors or residuals 
that we obtain if we take & = F and ft = b. The two remaining terms on 
the right are both positive unless d = Y and ft = b. This proves that the 
choice of F and b minimizes (6.7.1). 

6.8— The value of ft in some simple cases. The expression for b, 
Zxy/Zx 2 is unfamiliar at first sight. It is not obviously related to the 
quantity ft of which b is an estimate, nor is it clear that this is the estimate 
that common sense would suggest to someone who had never heard of 
least squares. A general expression relating b and ft and an examination 
of a few simple cases may make b more familiar. 

Denote the members of the sample by ( F„ X,), where the subscript i 
goes from 1 to n. The numerator of b is Yxy\ = Zx,(F - F) = Zx,F, 
- Zx.F. Since the term Zx,F vanishes, because Zx, = 0, the numerator 
of b may be written Zx,F r Now substitute F, = a + ftx, + e\. This 
gives 

b= ^x, 2 Zx, 2 ^ + Zx, 2 ‘ 

the term in a vanishing because = 0. Thus b differs from by a linear 
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expression in the e r If the e t were all zero, b would coincide with /?. 
Further, since the e l have zero means in the population, it follows that b 
is an unbiased estimate of /?. 

Turning to the simplest case, suppose that the sample consists of the 
values ( Y u 1) and ( F 2 , 2), The obvious estimate of the change in Y per 
unit increase m X is Y 2 - Y x . What does b give? Since X = the 
deviations are x 1 ~ — \/2,x 2 = + 1/2, giving Ex 2 = 1/2, Thus 

b = - jYi+hYi ^ y y 
\ ~ U 

in agreement. 

With three values ( Y l , 1 ), ( F 2 , 2), ( F 3 , 3) we might argue that Y 2 — 
and F 3 ~ Y 2 are both estimates of the change in Y per unit change in 
X . Since there seems no reason to do otherwise, we might average them, 
getting (F 3 - Y x )j2 as our estimate. To compare this with the least 
squares estimate, we have x x = — 1, x 2 = 0, x 3 = -f 1. This gives ExY 
= F 3 - Yi and Ex 2 = 2, so that b = (F 3 — Y x )/ 2, again in agreement 
with the common-sense approach. Notice that Y 2 is not used in estimat- 
ing the slope. Y 2 is useful in providing a check on whether the population 
regression line is straight. If it is straight, Y 2 should be equal to the 
average of Y x and F 3 , apart from sampling errors. The difference 
Y 2 -~(Y t + F 3 )/2 is therefore a measure of the curvature (if any) of the 
population regression. 

Continumg’in this way for the sample ( Y u 1), ( Y 2 , 2), ( F 3 , 3), ( F 4 , 4), 
we have three simple estimates of namely (Y 2 — FJ, (F 3 — Y 2 ), and 
(F 4 — F 3 ). If we average them as before, we get ( F 4 — FO/3. This is dis- 
concerting, since this estimate does not use either Y 2 or F 3 . What does 
least squares give? The values of x are —3/2, -1/2, 4-1/2, and 4-3/2 
and the estimate may be verified to be 

6 = (3F 4 4- F 3 - F 2 -3yi)/10. 

The least squares result can be explained as follows. The quantity 
( F 4 — Y x )[ 3 is an estimate of /?, with variance 2c y . x 2 /9. The sample sup- 
plies another independent estimate (F 3 — F 2 ), with variance 2c y . 2 . In 
combining these two estimates, the principle of least squares weights 
them inversely as their variances, assigning greater weight to the more 
accurate estimate. This weighted estimate is 

[9(F 4 - Y x )/ 3 4- (F 3 - F 2 )]/(9 4- 1) = (3F 4 4- F 3 - Y 2 - 3^/10 = b 

I As these examples show, it is easy to construct unbiased estimates of jf? 
| by simple, direct methods. The least squares approach automatically 
produces the estimate with the smallest standard error. 

Remember that b estimates the average change in Y per unit increase 
m X . Reporting a value of b requires that both units be stated, such as 
“systolic blood pressure per year of age.” 
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6.9 — The situation when X varies from sample to sample. Often the 
investigator does not select the values of X. Instead, he draws a sample 
from some population, then measures two characters Y and X for each 
member of the sample. In our illustration, the sample is a sample of 
apple trees in which the relation between the percentage of wormy fruits 
Y on a tree and the size X of its fruit crop is being investigated. In such 
applications the investigator realizes that if he drew a second sam- 
ple, the values of X in that sample would differ from those in the first 
sample. In the results presented in preceding sections, we regarded the 
values of X as essentially fixed. The question is sometimes asked: can 
these results be used when it is known that the Jf-values will change from 
sample to sample? 

Fortunately, the answer is yes, provided that for any value of X the 
corresponding Y satisfies the three assumptions stated at the beginning 
of section 6.4. For each X, the sample value of Y must be drawn from a 
normal population that has mean n = a. + fix and constant variance a rx 2 . 
Under these conditions the calculations for fitting the line, the /-test of b , 
and the methods given later to construct confidence limits for ft and for 
the position of the true line all apply without change. 

Consider, for instance, the accuracy with Which /? is estimated by b. 
The standard error of b is o y .JyJ(£x 2 ). If a second sample of n apple 
trees were to be drawn, we know that Z.v\ and hence the standard error 
of b, would change. That is, when X varies from sample to sample, some 
samples of size n provide more accurate estimates of j8 than others. . But 
since the value of Xx 2 is known for the sample actually drawn, it makes 
sense to attach to b the standard error <T y . x /yJ(Hx 2 ), or its estimate 
s rx /J(Lx 2 ). By doing so we take account of the fact that our b may be 
somewhat more accurate or somewhat less accurate than is usual in a 
sample of size n. In statistical theory this approach is sometimes de- 
scribed as using the conditional distribution of b for the values of X that 
we obtained in our sample, rather than the general distribution of b in 
repeated samples of size n. 

There is one important distinction between the two cases. Suppose 
that in a study of families, the heights of pairs of adult brothers (X) and 
sisters ( T) are measured. An investigator might be interested either in 
the regression of sister’s height on brother’s height : 

f = Y + b y . x (X - X) 

or in the regression of brother’s height on sister’s height: 

X = X +b x .y(Y- 7) 

These two regressjon lines are different. For a sample of 1 1 pairs of 
brothers and sisters, they are shown in figure 7.1.1 (p. 173). The line AB 
m this figure is the regression of Y on X, while the line CD is the regression 
of X on y. Since b yx = Xxy/Xx 2 and b x . y = Xxy/Xy 2 , it follows that 
b x . y is not in general equal to 1 /b y . x , as it would have to be to make the 
slopes AB and CD identical. 
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If the sample of pairs ( X , Y) is a random one, the investigator may 
use whichever regression is relevant for his purpose. In predicting 
brother’s heights from sister’s heights, for instance, he uses the regression 
of X on Y. If, however, he has deliberately selected his sample of values 
of one of the variates, say X , then only the regression of Y on X has mean- 
ing and stability. There are many reasons for selecting the values of X. 
The levels of X may represent different amounts of a drug to be applied 
to groups of animals, or persons of ages 25, 30, 35, 40, 45, selected for 
convenience in calculating and graphing the regression of Y on age, or a 
deliberate choice of extremes, so as to make Xx 2 large and decrease the 
standard error of b, Gy.J^&x 2 ). Provided that the X are selected with- 
out seeing the corresponding Y values, the linear regression line of Y on X 
is not distorted. Selection of the Y values, on the other hand, can greatly 
change this regression. Clearly, if we choose Y values that are all equal, 
the sample regression b of Y on X will be zero whatever the slope of the 
population regression. 

To turn to the numerical example, it contains another feature of 
interest, a regression that is negative instead of positive. 


TABLE 6.9.1 

Regression of Percentage of Wormy Fruit on Size of Apple Crop 


Tree 

Number 

Size of Crop 

on Tree Percentage of 

(hundreds of fruits) Wormy Fruits 

x r 

Estimate of 

A* 

t 

Deviation From 
Regression 

Y— f = d rx 

1 

8 

59 

56.14 

2.86 

2 

6 

58 

58.17 

-0.17 

3 

11 

56 

53.10 

2.90 

4 

22 

53 

41.96 

11.04 

5 

14 

50 

50.06 

-0 06 

6 

17 

45 

47.03 

-2.03 

7 

18 

43 

46.01 

-3.01 

8 

24 

42 

39.94 

2.06 

9 

19 

39 

45.00 

-6.00 

10 

23 

38 

40.95 

-2.95 

11 

26 

30 

37.91 

-7.91 

12 

40 

27 

23 73 

3.27 


IX = 228 

ZY = 540 




X - 19 

y = 45 




IX 2 = 5,256 

ZY 2 = 25,522 


1X7= 9,324 

c ZX) 2 /n = 4,332 

(XY) 2 /n = 24,300 

(XX)(I Y)/n = 10,260 


Zx 2 = 924 

Zy 2 = 1,222 


£ X y= —936 


b = Zxv/Zx 2 = 

-936/924 = — 1 013 per cent per 100 wormy fruits 


y = y + t(x — 

X) = 45 - 1 013(X — 

19) = 64 247 

- 1.013X 


I d rx 2 = 1,222 - {-936) 2 /924 = 273.88 
\ , 2 = I d y x 2 /{n - 2) = 273 88/10 = 27 388 
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It is generally thought that the percentage of f ruits attacked by codling 
moth larvae is greater on apple trees bearing a small crop. Apparently the 
density of the flying moth tends towards uniformity, so that the chance of 
attack for any particular fruit is augmented if there are few fruits in the 
tree. The data in table 6.9.1 are adapted from the results of an experiment 
(3) containing evidence about this phenomenon. The 12 trees were all 
given a calyx spray of lead arsenate followed by 5 cover sprays made up of 
3 pounds of manganese arsenate and 1 quart of fish oil per 100 gallons. 
There is a decided tendency, emphasized in figure 6.9. 1 , for the percentage 
of wormy fruits to decrease as the number of apples in the tree increases. 
In this particular group of trees, the relation of the two variates is even 
closer than usual. 



The new feature in the calculations is the majority of negative prod- 
ucts, xy, caused by the tendency of small values of Y to be associated 
with large values of X. The sample regression coefficient shows that the 
estimated percentage of wormy apples decreases, as indicated by the minus 
sign, 1 .01 3 with each increase of 100 fruits in the crop. The sample regres- 
sion line, and of course the percentage, falls away from the point, 0'(X, F), 
by 1.013 for each unit of crop above 19 hundreds. 

The regression line brings into prominence the deviations from this 
moving average, deviations which measure the failure of crop size to ac- 
count for variation in the intensity of infestation. Trees number 4, 9, and 
1 1 had notably discrepant percentages of injured fruits, while numbers 2 
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and 5 performed as expected. According to the model these are random 
deviations from the average (regression) values, but close observation of 
the trees during the flight of the moths might reveal some characteristics of 
this phenomenon. Tree 4 might have been on the side from which the 
flight originated or perhaps its shape or situation caused poor applications 
of the spray. Trees 9 and 1 1 might have had some peculiarities of confor- 
mation of foliage that protected them. Careful study of trees 2 and 5 
might throw light on the kind of tree or location that receives normal in- 
festation. This kind of case study usually does not affect the handling of 
the sample statistics, but it may add to the investigator’s knowledge of his 
experimental material and may afford clues to the improvement of future 
experiments. 

Among attitudes toward experimental data, two extremes exist, both 
of which should be avoided : some attend only to minute details of sample 
variation, neglecting the summarization of the data and the consequent 
inferences about the population; others are impatient of the data them- 
selves, rushing headlong toward averages and other generalizations. 
Either course fails to yield full information from the experiment. The 
competent investigator takes time to examine each datum together with 
the individual measured. He attempts to distinguish normal variation 
from aberrant observations. He then appraises his summary statistics 
and his population inferences and draws his conclusions against this back- 
ground of sample facts. 

EXAMPLE 6.9 1 — Another group of 12 trees, investigated by Hansberry and Richard- 
son, was sprayed with lead arsenate throughout the season. In addition, the fourth and fifth 
cover sprays contained 1% mineral oil emulsion and nicotine sulfate at the rate of 1 pint per 
100 gallons. The results are shown below. These facts may be verified: XX = 240, XY 
= 384, Xx 2 = 808, Xy 2 = 1/28, Xxy = - 582, regression coefficient = -0.7203, f = 46.41 
-0.7203JT, Y - Y for the first free = 16 40%. 


Size of Crop, X Hundreds 

15, 

15,. 

12, 

26, 

18, 

12, 

8, 

38, 

26, 

19, 

^ 

29, 22 

Percentage Wormy, Y 

52, 

46, 

38, 

37, 

37, 

37, 

34, 

25, 

22, 

22, 

20, 14 


EXAMPLE 6.9.2 — In table 6 9 1, calculate Xd r 2 - 273.88 by means of the formula 
given m section 6 2. 


EXAMPLE 6 9.3 — The following weights of body and comb of 15-day-old White 
Leghorn male chicks are adapted from Snedecor and Breneman (4) . 


Chick Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Body weight (grams), X 

83 

72 

69 

90 

90 

95 

95 

91 

75 

70 

Comb weight (milligrams), Y 

56 

42 

18 

84 

56 

107 

90 

68 

31 

48 


Calculate the sample regression equation, f - 60 + 2.302 ( X — 83). 

EXAMPLE 6 9 4 — Construct the graph of the chick data, plotting body weight along 
the horizontal axis Insert the regression line 
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6.10 — Interval estimates of ft ami tests of null hypotheses. Being pro- 
vided with point estimates of the parameters of the regression population, 
we turn to their interval estimates and to tests of hypotheses about them. 

First in order of utility, there is the sample regression coefficient b, 
an estimate of /?. As seen in section 6.2, in random sampling, h is dis- 
tributed with a variance estimated by 

* 2 = s r 2 fZx 2 
Thus, in the apple sampling of table 6.9.1, 

s„ 2 = 27.388/924 = 0.0296; s„ = 0.172% 

Moreover, since the quantity (b - fi)js h follows the t-distribution with 
n — 2 degrees of freedom, it may be said with 95% confidence that 


b — to.os s b ^ ft 5 » b + t 0 os s b 

For the apples, d.f. = 10, t 0 05 = 2.228, t 0 05 s„ » (2.228)(0.172) = 0.383, 

b — t 0Mi s h = - 1.013 - 0.383 - - 1.396 per cent per 100 fruits, 
b + t 0 QS $ h = -1.013 + 0.383 = -0.630 per cent per 100 fruits, 

and, finally, 

- 1.396 <, P £ -0.630 

If it is said that the population regression coefficient is within these limits, 
the statement is right unless the sample is one of the divergent kind that 
occurs about once in 20 trials. 

Instead of the interval estimate of fS, interest may lie in testing some 
null hypothesis. While it is now rather obvious that H 0 : /? = 0 will be 
rejected, we proceed with the illustration ; if there were any other pertinent 
value of p to be tested, we could use that instead. Since (b — 8)/s b follows 
the /-distribution we put 


b-P 


1.013 - 0 
0 172 


= -5.89, d.j. 


10 


The sign is ignored because the table contains both halves of the distribu- 
tion. H 0 is rejected. One concludes that in the population sampled there 
is a regression of percentage wormy apples on crop size, the value likely 
being between -0.630 and - 1.396 per cent per 100 fruits. 

6.11 — Prediction of the population regression line. Next, we may wish 
to make inferences about p = a + fix, that is, about the height of the pop- 
ulation regression line at the point X. The sample estimate of ju is ? = F 
+ bx. The error in the prediction is 


? - H~ (Y~ a ) + (b - f))x 
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But since Y = a + fix + e, we have 7 = a + e, giving 

f-ti = e + (b-P)x‘ (6.11.1) 

The term l has variance <J y .- x 2 /n, Further, b is distributed about /? with 
variance <j rx 2 /'Lx 1 . Finally, the independence of the e's guarantees that 
these two sources of error are uncorrelated, so that the variance of their 
sum is the sum of the two variances. This gives 

’’’ “ "'-’(n + 

The estimated standard error of 7 is 

s t = s rs J{l/n) + (x 2 fEx 2 ) (6.11.2) 

with (n — 2) d.f. 

For the apples, s rx = ^27.388, n = 12, and Ex 2 = 924. 

= ^27.388 7(1/12) +. (x 2 /924) = ^2.282 + 0.02964x 2 

For trees with a high crop like that of Tree 12, x = 21 and s t = 3.92%, 
notably greater than = 1.51% at x = 0. The reason why s f increases 
as X recedes from X is evident from the term (b — (l)x in equation (6.11.1). 
The effect of any error in b is^steadily magnified as x becomes greater. 

Corresponding to any 7, the point estimate of p, there is an interval 
estimate 

y - to.05 S f < P< ? + t 0-05 s f 

One might wish to estimate the mean percentage of wormy apples, p, at 
the point X = 30 hundreds of fruits. If so, 

x = X — X = 30 -19 =11 hundreds of fruits 
7 = 7 + bx = 45 - (1.013)(11) = 33.86% 

t 0 os = (2.228)72.282 + (0.02964) (ll 2 ) = 5.40% 

33.86 - 5.4C < p < 33.86 + 5.40 

Finally, 

28.46% < n < 39.26% 

At X = 30 hundreds of fruits, the population mean \x is estimated as 
33.86% wormy fruits with 0.95 confidence limits from 28.46% to 39.26%. 
This confidence interval is represented by AB in figure 6.11.1. 

If calculations like this are done for various values of X and if the 
confidence limits are plotted above and below the sample regression line, 
one has a confidence belt or zone with curved borders DB and CA in 
figure 6.11.1. The curves are the branches of a hyperbola. We have 
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confidence that fi y for any X lies in the belt. The figure emphasizes the 
increasing hazard of making predictions at X far removed from X. 


- 6.12 — Prediction of an individual F. A further use of regression is to 
predict the individual value of Y for a new member of the population for 
which X has been measured. The predicted value is again Y * Y + bx, 
but since Y = a + fix + e, the error of the prediction now becomes 

F- Y = (F— a)4-(h — [i)x — r 


The random element e for the new member is an additional source of un- 
certainty. So, the mean square error of the predicted value contains 
another term, being 


T£ 

n 


+ 


2 

X s 


y-x 


lx z 


4 . s z 
' . °y m x 


Since the term arising from the variance of e usually dominates, the stan 
dard error is usually written as 


$ y-x 


1 1 a 


( 6 . 12 . 1 ) 
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It is important not to confuse the two types of prediction. If the regres- 
sion of weight on height were worked out for a sample of 20-year-old 
males, the purpose might be to predict the average weight of 20-year-old 
males of a specific height. This is prediction of ju given X. Alternatively, 
we might want to predict the weight of a new male whose height is known. 
This is prediction of an individual Y , given X. 

The two prediction problems have the interesting feature that the pre- 
diction, Y, is exactly the same m the two problems, but the standard error 
of the prediction differs (compare equations [6.11,2] and [6.12.1]). To 
avoid confusion, use the symbols fi and when a population average is 
being predicted, and f and s<; when an individual Y is being predicted. 
For example, if you wish to predict the percentage of wormy apples on a 
tree yielding 30 hundreds of fruits, 

f 0 05 s v = 2228 v 27.388 V /1 + 1/12 + (ll) 2 /924 = 12.85% 

From this and Y = 33.86%, the confidence interval is given by 

33.86 - 12.85 < Y < 33.86 + 12.85 
or, 

21.01% < Y < 46.71%, 

as shown by JET 7 , figure 6.11,1. We conclude that for trees bearing 3,000 
fruits, population values of percentage wormy fruits fall between 21.01% 
and 46.71% unless a 1 -in-20 chance has occurred in the sampling. 

Continuing this procedure, a confidence belt HF and GE for Y may 
be plotted as in the figure. It is to be observed that all the sample points 
lie in the belt. In general about 5% of them are expected to fall outside. 

Unfortunately, the meaning of this confidence band is apt to be mis- 
understood. Suppose that we construct 95% confidence intervals for the 
Y values of a large number of new individual specimens that all have the 
same value of X. The 95% confidence probability is correct if for each new 
specimen we draw a new sample of values of ( Y ’, X), compute a new sample 
regression line and value of s y . x , and construct a new confidence interval 
from these data. If we make a large number of confidence interval state- 
ments from the same sample regression line, the proportion of these state- 
ments that is correct is not 95% for a specific line, but may be more or less. 
If the sample from which the regression line was computed happens to 
give an unusually low value of v x , b0 that the confidence band is nar- 
rower than usual, less than 95% of the confidence interval statements is 
likely to be correct. 

This point can be illustrated from the line constructed in table 6.4.1 
(p. 143) as an example of the regression model. The sample line is 
1 .44 -f 0.468X, and has a value 2.376 at X = 2. Further, s ? at X = 2 is 
found to be 1,325, and t 0r05 , for 4 d.f., is 2.776. Hence, the 95% con- 
fidence limits for an individual Y at X = 2 are 2.376 ± (2.776)(1.325), 
giung - 1.302 and 6.054 
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But we know from the population model that any new Y at X = 2 
is normally distributed with = 5 and <r « 1. The probability that this 
Flies between 0.948 and 8.484 is easily calculated from the normal table. 
It is practically 100%, instead of 95%. In fact, with this sample line, the 
95% confidence probability statements are conservative in this way at all 
six values of X. 

The worker who makes many predictions from the same sample line 
naturally wants some kind of probability statement that applies to his 
line. The available techniques are described by Acton (11). 

EXAMPLE 6.12.1— In the regression of comb weight of chicks on body weight, ex- 
ample 6.9.3, «= I0,F *=83 gms., Y - 60 mg., Ex 2 - 1 ,000, ly 2 * 6,854 and lx y * 2,302 
Set 95% confidence limits on or, assuming the same set of body weights. Ans. 49.8 - 70.2 mg 

EXAMPLE 6.12.2— In. the chick data, b » 2.302. Test the hypothesis that J? * 0. 
Ans. / = 5.22, P < 0.01. 

EXAMPLE 6.12.3— Since evidently there is a population regression of comb weight 
pn body weight, set 95% limits to the regression coefficient. Ans 1.28 - 3,32 mg. per gm 

EXAMPLE 6.12.4 — Predict the population average comb weight of 100-gm. chicks 
Ans. 99.1 mg. with 95% limits, 79.0 - 119.2 mg. 

EXAMPLE 6.12.5 — Set 95% confidence limits to the forecast of the comb weight of a 
randomly chosen 100-gm, chick, Ans. 61.3 - 136.9 mg. 

EXAMPLE 6.1 2.6— In the Indianapolis motor races (example 6.3.2) estimate the speed 
for the year 1946, for which the coded X is 35, and give 95% limits, remembering that in- 
dividual speeds are being estimated. Ans 122.3 miles per hour with 95% limits 
1 18.9 — 125.7. The actual speed m 1946 was 1 14.8 miles per hour, lying outside the limits, 
The regression formula overestimated the speeds consistently in the ten years following 1945. 

EXAMPLE 6.12.7 — Construct 80% confidence bands for the individual race results m 
the period 191 1-1941. Since there were 29 races, you should find about 6 results lying out- 
side the band. 

EXAMPLE 6.12,8 — In time series such as these races, the assumption that the s are 
independent of each other may not hold. Winning of successive races by the same man, 
type of car, or racing technique, all raise doubts on this point. If the a are not independent, 
f and b remain unbiased estimates of a and /?, but they are no longer the most precise 
estimates, and the formulas for standard errors and confidence limits become incorrect. 


6.13 — Testing a deviation that looks wspiciously large. When Y is 
plotted against X, one or two points sometimes look as if they lie far from 
the regression line When the line has been computed, we.can examine 
this question further by drawing the line and looking at the deviations for 
these points, or by calculating the values of d y . x for them. 

In this process one needs some guidance with respect to the question : 
When is a deviation large enough te excite suspicion? A test of signifi- 
cance is carried out as follows: 

1. Select the point with the largest d rx (in absolute value). As an 
illustration, we use the regression of wormy fruit on size of apple crop, 
table 6.9.1 and figure 6.9.1, p. 151. We have already commented that for 
tree 4, with X = 22, Y = 53, the deviation d yx = 11.04 looks large. 
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2. Recompute the. regression with this point omitted. This requires 
little work, since from the values E X, £ 7, EX 2 , £ Y 2 , and EAT, we simply 
subtract the contribution for tree 4. We find for the remaining n - 1 = 1 1 
points ; 

X = 18.73 : I* 2 = 914 
Y « 44.27 - 1.053* : s r * 2 = 15.50, with 9 df \ 

3. For the suspect, x = 22 - 18.73 = 3.27, 7 = 44.27 - (1.053)(3.27) 
= 40.83, 7 = 53. 

4. Since the suspect was not used in computing this line, we can re- 
gard it as a new member of the population, and test whether its deviation 
from the line is within sampling error. We have Y - Y = 53 - 40.83 
= 12.17. Since formula 6.12.1 is applicable to the reduced sample of size 
(n ~ 1), the variance due to sampling errors is 


s y _ f 2 = s 2 1 + 


n - 1 + Ex 2 


(15.50) (l + - + { ^pj = (15 50) (1.1 026) = 17.09 


The value of t is 


y — y 12 17 
t = - ' = = 2.943, 


s r _ f 7 17 - 09 


with 9 d.f. The 2% level of t is 2.821 and the 1% level is 3.250. By in- 
terpolation, P is about 0.019. 

As it stands, however, this /-test does not apply, because the test 
assumes that the new member is randomly drawn. Instead, we selected 
it because it gave the largest deviation of the 12 points. If P is the prob- 
ability that t for a random deviation exceeds some value / 0 , then for small 
values of P the probability that t max (computed for the largest of n devia- 
tions) exceeds t 0 is roughly nP. Consequently, the significance probability 
for our /-test is approximately (12)(0.019) = 0.23, and the null hypothesis 
is not rejected. 

When the null hypothesis is rejected, this indicates an inquiry to see 
whether there were any circumstances peculiar to this point, or any error 
of measurement or recording, that caused the large deviation. In some 
cases an error is unearthed and corrected. In others, some extraneous 
causal factor that made the point aberrant is discovered, although the 
fault cannot be corrected. In this event, the point should be omitted in the 
line that is to be reported and used, provided that the causal factor is 
known to affect only this point. When no explanation is found the situa- 
tion is perplexing. It is usually best to examine the conclusions obtained 
with the suspect (i) included, (ii) excluded. If these conclusions differ 
materially, as they sometimes do, it is well to note that either may be 
correct. 
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6.14 — Prediction of X from 7. Linear calibration. In some applica- 
tions the regression line is used to predict X and 7, but is constructed by 
measuring 7 at selected values of X. In this event, as pointed out in the 
discussion in section 6.9 (p. 150), the prediction must be made from the 
regression of Y on X. For example, X may be the concentration of soire 
element (e.g., boron or iron) in a liquid or in plant fiber and Y a quick 
chemical or photometric measurement that is linearly related to X. The 
investigator makes up a series of specimens with known amounts of X 
and measures Y for each specimen. From these data, the calibration 
curve, the linear regression of Y on X, is computed. Having measured Y 
for a new specimen, the estimate of x = X - X is 


* = (7- Y)/b 

Confidence limits for x and X are obtained from the method in sec- 
tion 6.12 by which we obtained confidence limits for Y given x. As an 
illustration we cite the example of sections 6.1 1-6.12 in which Y = per- 
centage of wormy fruits; X = size of crop (though with these data we 
would in practice use the regression of X on 7, since both regressions are 
meaningful). 

We shall find 95% confidence limits for the size of crop in a new tree 
with 40 per cent of wormy fruit. Turn to figure 6. 1 1 . 1 (p. 1 55). Draw a 
horizontal line at 7 = 40. The two confidence limits are the values of X 
at the points where this line meets the confidence curves GE and HF. 
Our eye readings were 7=12 and X — 38. The point estimate X of X is, 
of course, the value of X, 24, at which the horizontal line meets the fitted 
regression line. 

For a numerical solution, the fitted line is F + bx, where F = 45, 
b = — 1 .01 3. Hence the value of x when 7 = 40 is estimated as 

Z = (7- Y)/b = - (40-45)/ 1.0 13 = 4.936 : X = 23.9 hundreds 

To find the 95% confidence limits for x we start with the confidence 
limits of 7 given x : 

I i x 5 

7 = F + bx ± ts rx l 1 +- + Y (6.14.1) 

where X denotes Xx 2 and t is the 5% level for (n — 2) d.f. Expression 
(6.14.1) is solved as a quadratic equation in x for given 7. After some 
manipulation the two roots can be expressed in the following form, which 
appears the easiest for numerical work : 


* ± 


ts,. x /(«+!) 


(1 


c 2 ) + 


1 -c 2 


(6.14.2) 


where 
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2 


C 


2 


t 2 s b 2 

b 2 


]_ 

Z 



In this example n — 12, £ = 2.228 (10 d.f.), s rx = 5.233, E = 924, b = — 1.013, 
St = 4.936. These give 

(2.228)(5J33) ^ _ (11.509)* 

b ~ -1.013 " ’ 924 

From (6.14.2) the limits for x are 

x = 4.936 ± (11.509) ^(l-OSSSKO^e) + 0.0264} 

This gives - 7.4 and + 18.9 for x or 1 1.6 and 37.9 for X, in close agreement 
with the graphical estimate. 

The quantity c — tsjb is related to the test of significance of b. If 
b is significant at the 5% level, b/s h > t, so that c < 1 and hence c 2 < 1. If 
b is not significant, the denominator in equation (6.14.2) becomes negative, 
and finite confidence limits cannot be found by this approach. If c is small 
(b highly significant), c 2 is negligible and the limits become 


£ i 


b 


1 H h 

n 


St 2 

Ex 2 


These are of the form St ± ts t , where s x denotes the factor that multiplies 
t. In large samples, s* can be shown to be the estimated standard error of 
St, as this result suggests. 

In practice, Y is sometimes the average of m independent measure- 
ments on -the new specimen. The number 1 under the square root sign 
in (6.14.1) then becomes 1 /m. 


6.15 — Partitioning the sum of squares of the dependent variate. Re- 
gression computations may be looked upon as a process of partitioning 
E Y 2 into 3 parts which are both useful and meaningful. Y ou have become 
accustomed to dividing E Y 2 into ( Y,Y) 2 /n and the remainder, Zy 2 ; then 
subdividing Ey 2 into (Exy) 2 /Ex 2 and Yd y . x 2 . This means that you have 
divided E Y 2 into three portions : 

E Y 2 = (E Y) 2 /n + (Exy) 2 /Ex 2 + Yd y . x 2 

Each of these portions can be associated exactly with the sum of squares of 
a segment of the ordinates, Y. To illustrate this a simple set of data has 
been set up in table 6 . 1 5 . 1 and graphed in figure 6.15.1. 

In the figure the 'ordinate at X = 12 is partitioned into 3 segments:' 


Y = Y + y + d y . x , 

where J> = Y — F = bx is the deviation of the point Y on the fitted line 
from Y Each of the other ordinates may be divided similarly, though 
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' Data Set Up to Illustrate the Partition ovXY 2 
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X 

2 

4 

6 

8 

10 

12 

! 

14 

2^*56 

Y 

4 

2 

5 

9 

3 

11 

8 

IF* 42 


n = 7, Y = 8, F = 6, lx 1 = 1 12, Xy 1 = 68, Xxy - 56 


negative segments make the geometry less obvious. The lengths are all set 
out in table 6. 1 5.2 and the several segments are emphasized in figure 6. 1 5. 1 . 
Observe that in each line of the table (including the two at the bottom) 
the sum of the last three numbers is equal to the number in column Y. 
Corresponding to the relation 

Y=F+j> + d,. x , 

we have the following identity in the sums of squares 
ZY 2 = ZF 2 + Zj> 2 + Xd y . x \ 


each of the three- product terms being zero. The sums of squares of the 
ordinates, Z Y 2 = 320, and of the deviations from regression, Z d rx 2 - 40, 



Fig 6 15 1— Graph of data in table 6.15 1 The ordinate at X = 12 is shown divided into 
2 parts, F = 6 and > = 5 Then y is subdivided into 9 = 2 and d rx = 3 Thus Y = 7 + $ 

+ d yx = 6 + 2 + 3 = 1 1 
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TABLE 6 15 2 

Lengths of Ordinates in Table 6 15 1 Together With 
Segments Into Which They Are Partitioned 


Pair Number 

Ordinate 

Y 

Mean 

f 

Deviation 

9 

Deviation From 
Regression 
dy X 

1 

4 

6 

-3 

1 

2 

2 

6 

-2 

-2 

3 

5 

6 

-1 


4 

9 

6 


3 

5 

3 

6 

1 

— 4 

6 

11 

6 * 

2 

3 

7 

8 

6 

3 

-1 

Sum 

42 

42 

0 

0 

Sum of squares 

320 

252 

28 

40 


are already familiar 
(Exy) 2 /Ex 2 with I/p 2 


It remains to identify (EY) 2 /n with XY 2 am 
First, 


(XY) 2 

n 


(nTf 

n 


= «y 2 = xy 2 


That is, the correction for the mean is simply the sum of squares of th 
mean taken n times Second, 


(Zxy) 2 _ (Exj+ 
Ex 2 “(Ex 2 ) 2 


Ex 2 = +Ex 2 = Ei> 2 x 2 = E+ 


So the sum of squares attributable to the regression turns out to be th 
sum of squares of the deviations of the points Yon the fitted line from thei 
mean 

The vanishing of the cross-product terms is easily verified by th 
method used in section 6 6 

Corresponding to the partition of XY 2 there is a partition of th 


TABLE 6 15 3 

Analysis of Variance of Y in Table 6 15 1 


Description of 

Source of Variation 

Symbol 

Degrees of 
Freedom 

Sum of Squares 

Mean 

Square 

The mean 

Y 

1 

(2 Yf/n = 252 


Regression 

b 

1 

(XxyfjXx 2 = 28 


Deviation from regression 

dy x 

« - 2=5 

Xd yx 2 = 40 

J, x 2 = 8 

Total 

Y 

n = 1 

XY 1 = 320 


2+ 

= 28 + 40 

= 68, df 

= n - 1 = 6. 
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total degrees of freedom mto three parts Both partitions are shown m 
table 6 15.3 The n = 7 observations contribute 7 degrees of freedom, of 
which 1 is associated with the mean and 1 with the slope b of the regression 
coefficient, leaving 5 for the deviations from regression In most applica- 
tions the first line in this table is omitted as being of no interest, the break- 
down taking the form presented in table 6 15.4 


TABLE 6 15 4 

Analysis of Variance of y in Table 6 15 1 



Degrees of 

Sum of 

Mon 

Source of Variation 

Freedom 

Squares 

Square 

Regression 

1 

28 


Deviations from regression 

5 

40 

8 

Deviations from mean 

6 

ss 

11 1 


Table 6 1 5 4 is an analysis of variance table. In addition to providing 
a neat summary of calculations about variability, it proves of great utility 
when we come to study curved regressions and comparisons among more 
than two means The present section is merely an introduction to the 
technique, one of the major contnbutions of R. A Fisher (5). 

EXAMPLE 6 15 1 Dawes (6) determined the “density” of the melanin content of the 
skin of 24 male frogs together with their weights Since “Some of the 24 malts were 
selected for extreme duskiness or pallor so as to provide a measure of the extent of varubil 
ity ” that is since selection was exercised on density this variate must be taken as X 


Density, X ! 

0 13 

015 

0 28 

0 58 

068 

0 31 

0 35 

0 58 

Weight, Y 

13 

18 

18 

18 

18 

19 

21 

22 

Density, X 

0 03 

0 69 

0 38 

054 

' 1 00 

0 73 

0 77 

0 82 

Weight, Y 

22 

24 

25 

25 

25 

27 

27 

27 

Density, X 

1 29 

0 70 

0 38 

0 54 

1 08 

0 86 

0 40 

1 67 

Weight, Y 

* 28 

29 

30 

30 

35 

37 

39 

42 


Calculate J = 0 6225 units, F - 25 79 grams, Ex 2 «= 3 3276, = 1,211 96, Exv = 40 022 

EXAMPLE 6 15 2— In example 6 15 1 test the hypothesis, 0-0 Ana l- 381, P 
<0 01 

EXAMPLE 6 15 3- Analyze the variance of the frog weights, as follows 


Source of Variation 

Degrees of 
Freedom 

Sum of 

Squares Mean Square 

Mean 

1 

1 $,%5 04 

Regression 

1 

481 36 

Deviations 

22 

730 60 33 21 

Total 

24 

17 177 00 
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EXAMPLE 6 15.4 — How nearly free from error is the measurement of melanin 
density, After preparation of a solution from the skin of the frogs, the intensity of the 
color was evaluated m a colorimeter and the readings then transferred graphically mto 
neutral densities. The figures reported are means of from 3 to 6 determinations. The error 
of this kind of measurement is usually appreciable This makes the estimate of regression 
biased downwards. Had not the investigator wished to learn about extremes of density, 
the regression of density on weight might have been not only unbiased but more informative. 

6.16 — Galton’s use of the term “regression.” In his studies of in- 
heritance Galton developed the idea of regression. Of the ‘law of uni- 
versal regression” (7) he said, “Each peculiarity in a man is shared by his 
kinsman, but on the average in a less degree.” His friend, Karl Pearson 
(8), collected more than a thousand records of heights of members of 
family groups. Figure 6.16.1 shows his regression of son’s height on 



60 62 64 66 68 70 72 74 

FATHER'S HE IGHT (inches) 


Fig 6 16 1 — Regression of son’s stature on father’s (8) ? = 0 516X+ 33 73 

1,078 families 

father’s. Though tall fathers do tend to have tall sons, yet the average 
height of sons of a group of tall fathers is less than their father’s height. 
There is a regression , or going back, of son’s heights toward the average 
height of all men, as evidenced by the regression coefficient, 0.516, sub- 
stantially less than 1 . 

6AT — Regression when X Is subject to error. Thus far we have as- 
sumed that the ^-variable in regression is measured without error. Since 
no measuring instrument is perfect, this assumption is often unrealistic. 
A more realistic model is one that assumes Y = a + j 8(X — X) -f e as be- 
fore, but regards X as an unknown true value. Our measurement of X is 
X' = X + e, where e is the error of measurement. For any specimen we 
know ( F, X f ) but not X. 

If the measurement is unbiased, e 9 like a, is a random variable follow- 




ing a distribution with mean 0 . The errors e may arise from several 
sources. For instance, if X is the average price of a commodity or the 
average family income in a region of a country, this is usually estimated 
from a sample of shops or of families, so that X' is subject to a sampling 
error. With some concepts like “educational level” or “economic status” 
there may be no fully satisfactory method of measurement, so that e may 
represent in part measurement of the wrong concept. 

If 6, e, and the true X are all normally and independently distributed 
it is known that Y and X' follow a bivariate normal distribution (section 
7.4.). The regression of Y on X' is linear, with regression coefficient 

P' -M 1 + A), 

where A = a 2 /a x 2 . (If X is not normal, this result holds In large samples 
and approximately in small samples if A is small.) Thus, with errors in 
X, the sample regression coefficient, b\ of Y on X' no longer provides an 
unbiased estimate of p, but of jS/(l + A). 

If the principal objective is to estimate p, often called the structural 
regression coefficient, the extent of this distortion downwards is de- 
termined by the ratio A = a 2 /a x 2 . Sometimes it is possible to obtain an 
estimate s 2 of a 2 . Since a x 2 = a x 2 + c 2 , an estimate of A is 
X = s 2 /(s x 2 — s e 2 ). From 1 we can judge whether the downward bias is 
negligible or not. If it is not negligible, the revised estimate b'( 1 + X) 
should remove most of the bias. 

In laboratory experimentation, A is often small even with a measuring 
instrument that is not highly accurate. For example, suppose that a x 
= 20, n x = 100, so that nearly all the values of the true JTs lie between 50 
and 1 50. Consider a e = 3. This implies that about half of the true X's are 
measured with an error greater than 2 and about one third of them with 
an error greater than 3 — a rather imprecise standard of performance. 
Nevertheless, A is only 9/400 = 0 . 022 . 

If the objective is to predict the population regression line or the 
value of an individual Y from the sample of values ( Y, X'), the methods 
of sections 6.1 1 and 6. 12 may still be used, with X' in place of X, provided 
that X, e, and £ are approximately normal. The presence of errors in X 
decreases the accuracy of the predictions, because the residual variance 
is increased, though to a minor extent if A is small. The relation between 
a Y -x' 2 and fy-jt 2 may be put in two equivalent forms: 

(Jy 1 ~ Oy. x 2 — (<7y 2 — 1 + A), (6.17.1) 

or, 

o-y-x 2 '= <r Y . x 2 4 - - --- - - {<?y 2 ~ 0r-x 2 ) (6.17.2) 

(1 + x) 

Berkson (10) has pointed out an exception to the above analysis. 
Many laboratory experiments are conducted by setting X ' at a series of 
fixed values. For instance, a voltage may be set at a series of prede- 
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termined levels X x \ X 2 \ ... on a voltmeter. Owing to errors in the volt- 
meter or other defects in the apparatus, the true voltages X u X 2 , . . . 
differ from the set voltages. 

In this situation we still have Y = a ff f$X + e, X' = X 4- e. In both 
our original case (X normal) and in Berkson’s case (X' fixed) it follows 
that 


7 = a + fix' + (e~ fie) (6,17.3) 

The difference is this. In our case, e and X' are correlated because of the 
relation X' = X + e. Consequently, the residual ( e - fie) is correlated 
with X' and does not have a mean zero for fixed X'. This vitiates Assump- 
tion 2 of the basic model (section 6.4). With X’ fixed, however, e is 
correlated with X but not with X\ and the model (6.17.3) satisfies the 
assumptions for a linear regression. The important practical conclusion 
is that b\ the regression of Y on X\ remains an unbiased estimate of /?. 

6.18 — Fitting a straight line through the origin. From some data the 
nature of the variable Y and X makes it clear that when X = 0, Y must be 
0. If a straight line regression appears to be a satisfactory fit, we have the 
relation 


7 = pX+ e 

where, in the simplest situations, the residual g follows Jf( 0, a 2 ). The least 
squares estimate of f is b = HXY/YX 2 . The residual mean square is 

s rx 2 = {IF 2 - G ZXY) 2 /YX 2 }/(n ~ 1) 
with (n — 1 ) d.f. Confidence limits for p are 

b ± ts b9 

where t is read from the /-table with (n — 1) d.f. and the appropriate 
probability. 

This model should not be adopted without careful inspection of the 
data, since complications can arise. If the sample values of X are all some 
distance from zero, plotting may show that a straight line through the 
origin is a poor fit, although a straight line that is not forced to go through 
the origin seems adequate. The explanation may be that the population 
relation between Y and X is curved, the curvature being marked near zero 
but slight in the range within which X has been measured. A straight line 
of the form (a 4- bx) will then be a good approximation within the sample 
range, though untrustworthy for extrapolation. If the mathematical form 
of the curved relation is known, it may be fitted by methods outlined in 
chapter 15. 

It is sometimes useful to test the null hypothesis that the line, as- 
sumed straight, goes through the origin. The first step is to fit the usual 
two-parameter line (a + px), i.e., a 4 fi(X - X\ by the methods given 
earlier m this chapter. The condition that the population line goes 



m 


through the origin is a - fiX = 0. The sample estimate of this quantity 

is Y - bX, with estimated variance 

s rx 2 (l/n + X 2 /Yx 2 ) 

Hence, the value of t for the test of significance is 

Y-bl 

1 s rx J{Vn + X 2 pLx 2 } (6181) 

with {n — 2) d.f. This test is a particular case of the technique presented 
in section 6. 1 1 for finding confidence limits for the population mean value 
of Y corresponding to a given value of X. 

The following example comes from a study (9) of the forces necessary 
tq draw plows at the speeds commonly attained by tractors. Those results 
of the regression calculations that are needed are shown under table 6 . 18 . 1 . 


TABLE 6.18.1 

Draft and Speed of Plows Drawn by Tractors 


Draft (lbs.) Y 

425 

420 

480 

495 

540 

530 

590 

610 

690 

680 

Speed (m.p.h.) X 

0.9 

1.3 

2.0 

2.7 

3.4 

*3.4 

4.1 

5.2 

5.5 

6.0 


X - 3.45 m.p.h. Y - 546 lbs. n - 10 
Xx 2 = 27.985 Xy 2 = 82,490 Zxy - 1,4920 
b = 53.31 lbs. per mile 
s,., 2 == 368.1 with 8 d.f. 


One might suggest that the line should go through the origin, since 
when the plow is not moving there is no draft. However, inspection of 
table 6.18.1, or a plot of the points, makes it clear that when the line is 
extrapolated to X = 0, the predicted Y is well above 0, as would be 
expected since inertia must be overcome to get the plow moving. From 
(6.18.1) we have 


t = 


546 - (53.34) (3.45) 



362.0 

13.90 


26.0 


with 8 d.f., confirming that the line does not go through the origin. 

When the line is straight and passes through (0, 0), the variance of 
the residual e is sometimes not constant, but increases as X moves away 
from zero. On plotting, the points lie close to the line when X is small but 
diverge from it as X increases. The extension of the method of least 
squares to this case gives the estimate b — Ew x XYfZw x X 2 , where w x is the 
reciprocal of the variance of e at the value of X in question. 

If numerous observations of Y have been made at each selected X, 
the variance of e can be estimated directly for each X and the form of the 



168 Chapter 6: Regression 

functions u> x determined empirically. If there are not enough data to use 
this method, simple functions that seem reasonable are employed. A 
common one when all X's are positive is to assume that the variance of e 
is proportional to X, so that w x = fc/if, where k is a constant. This gives 
the simple estimate b = "LYfLX = Y/X. The weighted mean square of 
the residuals from the fitted line is 

V* 2 = {£( W) - (xr) 2 /sx}/(« - 1) 

and the estimated standard error of b is Sy.J^/'LX. 


TABLE 6.18.2 

Number of Acres in Corn on 25 Farms in South Dakota (1942) 
Selected by Farm Size 


Size of Farm 
(acres) 

Acres in 
Com 


Standard 

Deviation 

Ratio 

Ratio 

X 

i 

Y 

Range 


s,/X 

Y/X 
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Sometimes the standard deviation of e is proportional to X, so that 
w x = k/X 2 . This leads to the least squares estimate 

b = 'L{XYIX 2 )[1.{X 2 IX 2 ) = E( Y/X)/n, 

in other words, the mean of the individual ratios Y/X. This model is 
illustrated by the data in table 6.18.2, taken from a farm survey in eastern 
South Dakota m 1942, in which the size of the farm X and the number of 
acres in corn Y were measured. Five of the commoner farm sizes: 80, 
} 60, 240, 320, and 400 acres, were drawn. For each size, five farm records 
were drawn at random. 

The ranges of the several groups of Y indicate that cr is increasing 
with X. The same thing is shown in figure 6. 1 8. 1. To get more detailed 
information, s y was calculated for each group, then the ratio of S y to X. 
These ratios are so nearly constant as to justify the assumption that in the 
population aJX is a constant. Also it seems reasonable to suppose that 
0(0, 0) is a point on the regression line. 

The value of b, 0.243 corn acres per farm acre, is computed in table 
6.18.2 as the mean of the ratios Y/X. The sample regression line is 
? = 0.243*. 



Fig 6 18 1— Regression of torn acres on farm acres 
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To find the estimated variance of h , first compute the sum of squares 
of deviations of the 25 ratios R = Y/X from their means, and divide by 
n — 1 = 24. This gives s R 1 = 0.008069. Then 


5ft 


2 


5ft 



0.008069 

25 


= 0.0003228 


= 0.0180, d/. = n- 1 = 24 


The 95% interval estimate of /? is set in the usual way, 
b ~ to os 5ft < P <: b -f t 0 05 s b , 


the result being 0.206 < fi < 0 . 280 . ___ __ 

In straight lines through the origin the point ( X , Y) does not in gen- 
eral lie on the fitted line. In the figure, (240, 56.28) falls below the line. 
An exception occurs when o 2 3 4 is proportional to X , giving b — Y jX as we 
have seen. 

649— The estimation of ratios. With data in which it is believed that 
Y is proportional to X , apart from sampling or experimental error, the 
investigator is likely to regard his objective as that of estimating the com- 
mon ratio YJX rather than as a problem in regression. If his conjecture 
is correct, that is, if Y = fiX 4- s, the three quantities YLXY/YLX 2 , £ YfLX 
and Y(YfX)/n are all unbiased estimates of the population ratio jS. The 
choice among the three is a question of precision. The most precise 
estimate is the first, second, or third above according as the variance of e 
is constant, proportional to X, or proportional to X 2 . If the variance of & 
is expected to increase moderately as X increases, though the exact rate 
is not known, the estimate £ 7 /£ X usually does well, in addition to being 
the simplest of the three. 

Before one of these estimates is adopted, always check that Y is 
proportional to X by plotting the data and, if necessary, testing the null 
hypothesis that the line^goes through the origin. Hasty adoption of some 
form of ratio estimate tfiay lose the information that Y/X is not constant 
as X varies. 

6.20 — Summary. The six sample values, n, X, F, £x 2 , £y 2 , £xy, 
furnish all regression information about the population line fi — a + fix: 

1. The regression coefficient of Ton X: b = £xy/£x 2 . The estimate 
of a : a ±= Y 

2 . The sample regression equation of Y on X: t = F + bx 

3. Y adjusted for X : Adjusted Y = Y — bx 

4. The sum of squares attributable to regression: 

(£xy) 2 /£x 2 = £>> 2 
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5. The sum of squares of deviations from regression: 

Zy 2 - ('LxyffLx 2 = I d,. 2 

6. The mean square deviation from regression : 

2 ) = V * 2 

7. The sample standard error of F estimated from X‘. 

sy-x = Syjyjn 

8. The sample standard deviation of the regression coefficient: 

S b = Sy.Jyjl.X 2 

9. The sample standard deviation of F as an estimate of/i=*a + (lx: 

= s rxs /l/n + x 2 fLx 2 

10. The sample standard deviation of P as an estimate of a new point 
Y: 


s t = Sy. xy J 1 + 1/n + x 2 /Zx 2 ’ 

11. The estimated height of the line when A" == 0: F - hY. This is 
sometimes called the intercept or the elevation of the line. 
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n 

V_^orrelation 


7.1 — Introduction. The correlation coefficient is another measure of 
the mutual relationship between two Variables. Table 7.1.1 and figure 
7.1.1 show the heights of 11 brothers and sisters, drawn from a large 
family study by Pearson and Lee (1). Since there is no reason to think of 
one height as the dependent variable and the other as the independent 
variable, the heights are designated X t and X 2 instead of Y and X. To 
find the sample correlation coefficient, denoted by r, compute Ex x 2 , Ex 2 2 , 
and 1 Lx i x 1 as in the previous chapter. Then, 

r = 'Lx 1 x 2 /'J {(Xx! 2 )(Zx 2 2 )} = 0.558, 

as shown under table 7.1.1. Roughly speaking, r is a quantitative expres- 
sion of the commonly observed similarity among children of the same 
parents — the tendency of the taller sisters to have the taller brothers. In 
the figure, the value r = 0.558 reflects the propensity of the dots to lie in 
a band extending from lower left to upper right instead of being scattered 
randomly over the whole field. The band is often shaped like an ellipse, 
with the major axis sloping upward toward the right when r is positive. 

EXAMPLE 7.1.1 — Calculate r = 1 for the following pairs: 

AT X : 1, 2, 3, 4, 5 
X 2 : 3, 5, 7, 9, 11 


TABLE 7.1.1 

Stature (Inches) of Brother and Sister 
(Illustration taken from Pearson and Lee’s sample of 1 ,401 families) 


Family Number 

1 

2 3 ( 4 

5 

6 

7 

8 

9 

10 

11 

Brother, X l 

71 

68 66 ' 67 

70 

71 

70 

73 

72 

65 

66 . 

Sister, X 2 

69 

64 65 63 

65 

62 

65 

64 

66 

59 

62 

m = 11, X x = 69, X 2 = 

64, Ex, 2 = 74, 

E.v 2 2 

= 66, 

ItjA'2 = 39 





r - I x j x z /J{'Z a i 2 )( 2jc 2 2 ) = 39/^/(74)(66) = 0.558. Pearson and Lee’s r = 0.553 
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Fig. 7.1.1— Scatter {or dot) diagram of stature of 1 1 brother-sister pairs, r * 0.558, 


Represent the data in a graph similar to figure 7.U 

EXAMPLE 7.1.2 — Verify r = 0.91 in the pairs: 

X x : 2, 5, 6, 8, 10, 12, 14, 1$, 18, 20 
X 2 : 1, 2, 2, 3, 2, 4, 3, 4, 4, 5 

Plot the elliptical band of points, 

EXAMPLE 7.1.3— In the following, show that r « 0.20: 

X t : 3, 5, 8, 11, 12, 12, 17 
X 2 : 11, 5, 6, 8, 7, 18, 9 

Observe the scatter of the points in a diagram. 

EXAMPLE 7.1.4— In the apple data of table 6.9.1, lx 2 * 924, Zy 2 * 1,222, Zxy 
= -936. Calculate r ~ -0.88. 

7.2 — The sample correlation coefficient r. The correlation coefficient 
is a measure of the degree of closeness of the linear relationship between 
two variables. 

Two properties of r should be noted : 

(i) r isa pure number without units or dimensions, because the scales 
of its numerator and denominator are both the products of the scales in 
which X t and X 2 are measured. One useful consequence is that r can be 
computed from coded values of X t and J 2 . No decoding is required. 
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(ii) r always lies between - 1 and + 1 (proved in the next section, 7.3 ). 
Positive values of r indicate a tendency of X 1 and X 2 to increase together. 
When r is negative, large values of X x are associated with small values 
ofX 2 . 

To help you acquire some experience of the nature of r, a number of 
simple tables with the corresponding graphs are displayed in figure 7.2.1. 
In each of these tables n = 9, X 2 = 12, X 2 = 6, Zxj 2 = 576, Sx 2 2 = 144. 
Only "Lx x x 2 changes, and with it the value of r. Since ^/(Zx 1 2 )(Dx 2 2 ) 
= 576)(144) = 288, the correlation is easily evaluated in the several 

tables by calculating Zx 1 x 2 and dividing by 288 (or multiplying by 
1/288 = 0.0034722 ... if a machine is used). 

In A, the nine points lie on a straight line, the condition for r = 1. 



X| o 4 6 8 12 14 16 22 26 X, 0 4 6 8 12 14 16 22 26 

IjO 2 3 4 6 7 8 II 0 3j02 4 3 7 6 8 II 13 
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The hne is a “degenerate” ellipse — it has length but no width. The two 
variables keep in perfect step, any change in one being accompanied by 
a proportionate change in the other. B depicts some deviation from an 
exact relationship, the ellipse being long and thin with r slightly reduced 
below 1. In C, the ellipse widens, then reaches circularity in D where 
r — 0. This denotes no relation between the two variables. E and F 
show negative correlations tending toward - 1 . To summarize, the thin- 
ness of the ellipse of points exhibits the magnitude of r, while the inclina- 
tion of the axis upward or downward shows its sign. Note that the slope 
of the axis is determined by the scales of measurement adopted for the two 
axes of the graph and is therefore not a reliable indicator of the magnitude 
of r. It is the concentration of the points near the axis of the ellipse that 
signifies high correlation. 

The larger correlations, either positive or negative, are fairly obvious 
from the graphs. It is not so easy to make a visual evaluation if the 
absolute value of r is less than 0.5 ; even the direction of inclination of the 
ellipse may elude you if r is between -0.3 and +0.3. In these small 
samples a single dot can make a lot of difference. In D, for example, if 
the point (26, 0) were changed to (26, 9), r would be increased from 0 to 
0.505. This emphasizes the fact that sample correlations from a bivariate 
population in which the correlation is p are quite variable if n is small. In 
assessing the value of r in a table, select some extreme values of one 
variable and note whether they are associated with extreme values of the 
other. If no such tendency can be detected, r is likely small. 

Perfect correlation (r = 1) rarely occurs in biological data, though 
values as high as 0.99 are not unheard of. Each field of investigation has 
its own range of coefficients. Inherited characteristics such as height ordi- 
narily have correlations between 0.35 and 0.55. Among high school 
grades r averages around 0.35 (3). Pearson and Lee got “organic correla- 
tions,” that is, correlations between two such measurements as stature and 
span in the same person, ranging from 0.60 to 0.83. Brandt (2) calculated 
the sample correlation, 0.986, between live weight and warm dressed 
weight of 533 swine. Eward et ai (6) estimated r= —0.68 between 
average daily gain of swine and feed required per pound gained. 

7.3 — Relation between the sample coefficients of correlation and re- 
gression. If X 2 is designated as the dependent variable, its regression co- 
efficient on X u say b 2l , is X,x l x 2 /'Zx l 1 . But if X\ is taken as dependent, 
its regression coefficient on X 2 is b 12 — Ex 1 x 2 fLx 2 2 . The two regression 
lines are shown in each diagram of figure 7.2.1. The two lines are the 
same only if r = ± 1, as illustrated in +, although they are close together 
if r is near ± 1 . In the diagrams the regression of X t on X 2 is always the 
hne that makes the lesser angle with the vertical axis. 

The fact that there are two different regressions is puzzling at first 
sight, since m mathematics the equation by which we calculate X 2 when 
given X , is the same as the equation by which X x is calculated when X 2 
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is given. In correlation and regression problems, however, we are dealing 
with relationships that are not followed exactly. For any fixed X x there 
is a whole population of values of X 2 . The regression of X 2 on X x is the 
line that relates the average of these values of X 2 to X x . Similarly, for 
each X 2 there is a population of values of X x , and the regression of X x on 
X 2 shows the locus of the averages of these populations as X 2 changes. 
The two lines answer two different questions, and coincide only if the 
populations shrink to their means, so that X x and X 2 have no individual 
deviation from the linear relation. 

A useful property of r is obtained from the shortcut method of 
computing s y . x 2 in a regression problem. Reverting to Y and X, it will 
be recalled from the end of section 6.2 that 

Zd rx 2 = (»- 2 )s r 2 = Sy 2 - (Lxy) 2 fLx 2 

Substituting (Xxy) 2 = r 2 Xx 2 Xy 2 , we have 

Yd r 2 = (/»- 2)s y . x 2 = (1 - r 2 )Xy 2 (7.3.1) 

Since Yd y . 2 cannot be negative, this equation shows that r must lie be- 
tween — 1 and +1. Moreover, if r is ±1, Y.d r 2 is zero and the sample 
points lie exactly on a line. 

The result (7.3.1) provides another way of appraising the closeness of 
the relation between two variables. The original sample variance of Y, 
when no regression is fitted, is s y 2 = Xy 2 /(« - 1), while the variance of the 
deviations of Y from the linear regression is (1 — r 2 )Xy 2 /(n — 2) as shown 
above. Hence, the proportion of the variance of Y that is not associated 
with its linear regression on X is estimated by 

Vx 2 (w — 1)(1 — r 2 ) 

s, 2 (n — 2) j 

if n is at all large. Thus r 2 may be described as the proportion of the 
variance of Y that can be attributed to its linear regression on X, while 
(1 — r 2 ) is the proportion free from X. The quantities r 2 and (1 — r 2 ) are 
shown in table 7 3 1 for a range of values of r. 


TABLE 7 3 1 

Estimated Proportions of the Variance of Y Associated and 
Not Associated With X in a Linear Regression 



Proportion 


Proportion 


Associated 

Not 


Associated 

Not 

r 

r 2 

d - r 2 ) 

r 

r 2 

(1 - r*) 

+ 01 

0 01 

0 99 

±06 

0 36 

0 64 

+ 02 

0 04 

0 96 

±07 

0 49 

051 

+ 03 

0 09 

091 

+ 08 

0 64 

0 36 

+ 04 

0 16 

0 84 

+ 09 

081 

0 19 

+ 05 

0 25 

0 75 

+ 095 i 

0 90 

0 10 
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When r is 0.5 or less, only a minor portion of the variation in Y can 
be attributed to its linear regression on X. At r = 0.7, about half the 
variance of Y is associated with X, and at r = 0.9, about 80%. In a sample 
of size 200, an r of 0.2 would be significant at the 1% level, but would 
indicate that 96% of the variation of Y was not explainable through its 
relation with X. A verdict of statistical significance shows merely that 
there is a linear relation with non-zero slope. Remember also that con- 
vincing evidence of an association, even though close, does not prove 
that X is the cause of the variation in Y. Evidence of causality must 
come from other sources. 

Another relation between the sample regression and correlation coeffi- 
cients is the following. With Y as the dependent variable, 

b = 5 ^= . y/^y 2 ^ r s * 

Ex 2 V(£* 2 )(Ey 2 )V £ * 2 ^ 

Or, equivalently, r = bs x /s y . Thus b is easily obtained from r, and vice 
versa, if the sample standard deviations are known. 

In some applications, a common practice is to use the sample stan- 
dard deviations as the scale units for measuring the variates x = X — X 
and y = Y— Y. That is, the original variates X and Y are replaced by 
x' = x/s x and y' = y/s y , said to be in standard units. The sample regres- 
sion line 

f-Y=b(X-X) 

then becomes 

bs, 

$'s v = bx's x , or V — — x' = rx' 

S s 

where j>' is the predicted value of Y in standard units. In standard measure, 
r is the regression coefficient, and the distinction between correlation and 
regression coefficients disappears. 

7.4 — The bivariate normal distribution. The population correlation 
coefficient p and its sample estimate r are intimately connected with a 
bivariate population known as the bivariate normal distribution. This 
distribution is illustrated by table 7.4.1 which shows the joint frequency 
distributions of height (T t ) and length of forearm (X 2 ) for 348 men. The 
data are from the article by Galton ( 1 8) in 1 888 in which the term “co-rela- 
tion” was first proposed. 

To be observed in the table are five features: 

(i) Each row and each column in the body of the table is a frequency 
distribution. Also, the column at the right, headed f 2 , is the total fre- 
quency distribution of X 2 , length of forearm, and the third-to-the-last 
row below is that of X x , height. 

(li) The frequencies are concentrated in an elliptical area with the 



TABLE 7.4 2 

Frequency of Fairs of Measurements of Height and Length of Forearm. Galton’s Data With the Outer Classes Distributed in 
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major axis inclined upward to the right. There are no very short men 
with long forearms nor any very tall men with short forearms. 

(iii) The frequencies pile up along the major axis, reaching a peak 
near the center of the distribution. They thin out around the edges, 
vanishing entirely beyond the borders jif the ellipse. 

(iv) The center of the table is at X l = 67.5 inches, J 2 = 18.1 inches. 
This point happens to fall in the cell containing the greatest frequency, 
28 men. 

(v) The bivariate frequency histogram can be presented graphically 
by erecting a column over each cell in the table, the heights of columns 
being proportional to the cell frequencies. The tallest colu mn would be 
in the center, surrounded by shorter columns. The heights would de- 
crease toward the perimeter of the ellipse, with no columns beyond the 
edges. A ridge of tall columns would extend along the major axis. 

The shape of the bivariate normal population becomes clear if you 
imagine an indefinite increase in the total frequency with a corresponding 
decrease m the areas of the table cells. K smooth surface would over- 
spread the table, rising to its greatest height at the center (/q, fading 
away to tangency with the XY plane at great distances. 

Some properties of this new model are as follows: 

(i) Each section perpendicular to the X l axis is a normal distribution, 
and likewise, each section perpendicular to the X 2 axis. This means that 
each column and each row in table 7.4.1 is a sample from a normal fre- 
quency distribution. 

(ii) The frequency distributions perpendicular to the X x axis all have 
the same standard deviation, <r 2 -i> and they have means all lying on a 
straight regression line, /x 2 . 2 = a 2 + j8 2 ., X x . The sample means and 
standard deviations are recorded in the last two lines of the table. While 
there is considerable variation in s 2 . t , each is an estimate of the common 
parameter, o-j.,. 

(iii) The frequency distribution perpendicular to the X 2 axis have a 
common standard deviation, <ri- 2 (note the estimators in the right-hand 
column of the table), and their means lie on a second regression line, 

Pl-2 = a l + $1-2^2- 

(i\ ) Each border frequency distribution is normal. That on the right 
is . 4 (/x 2 , cr 2 ), while the one below the body of the table is J^Oq, cq). 

(v) The distribution of the bivariate frequency distribution has the 
coefficient, l/27rcr 1 cr 2> /(l - p 2 ), followed by e with this exponent: 

- K*i -li i)>i 2 - Wi -HM ~ a + (*2 - M2)7<T2 2 1/2(1 - P 2 ) 

This distribution has five parameters. Four of them are familiar; 

fi 2 , a 2 - The 1S the correlation coefficient, p , of which r is an 
estimator. The parameter, /?, measures the closeness of the population 
relation between X x and X 2 \ it determines the narrowness of the ellipse 
containing the major portion of the observations 
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EXAMPLE 7.4.1 — Make a graph of X 2 1 in the next-to-the-last line of table 7.4.1 . The 
values of X 1 are the class-marks at the top of the columns. The first class mark may be taken 
as 59.5 inches. 

EXAMPLE 7.4.2 — Graph the X x 2 on the same sheet with that of X 2 x . The class 
marks for X 2 are laid off on the vertical axis. The first class mark may be taken as 21.25 
inches. If you are surprised that the two regression lines are different, remember that X 2 x 
is the mean of a column while X x 2 is the mean of a row. 

EXAMPLE 7.4.3 — Graph s 2 x against X x . You will see that there is no trend, indicating 
that all the s 2 . x may be random samples from a common g 2A . 

EXAMPLE 7.4.4 — The data m example 6.9.3 may be taken as a random sample from a 
bivanate normal population. You had X — 83 gms., Y = 60 mg., lx 2 ~ 1 ,000, Xy 2 = 6,854, 
Xxy = 2,302. Calculate the regression of body weight, X, on comb weight, Y. Ans. 
X = 83 -h 0.336 (Y - 60) gms. Draw the graph of this line along with that of example 
6.9.4. Notice that the angle whose tangent is 0.336 is measured from the Y axis. 

EXAMPLE 7.4.5 — In the chick experiment, estimate <j y . x . Ans. s rx = 1 3.9 mg. Also 
estimate <r x . y . Ans. s x . y — 15.1 gms. In s x . yy the deviations from regression are measured 
horizontally. 

EXAMPLE 7.4.6 — From the chick data, estimate p. Ans. r = 0.88. 

EXAMPLE 7.4.7 — If y — a + bu and x = c + dv, where a , b, c, and d are constants, 
prove that r xy = r uv . 

EXAMPLE 7.4.8 — Thirty students scored as follows in two mathematics achievement 
tests: 


I 73 41 83 71 39 60 51 41 85 88 44 71 52 74 50 

II 29 24 34 27 24 26 35 18 33 39 27 35 25 29 13 


I 43 85 53 85 44 66 60 33 43 76 51 57 35 40 76 

II 13 40 23 40 22 25 21 26 19 29 25 19 17 17 35 


Calculate r — 0.774. 


From the formula for r we can derive a much used expression for p. 
Write 


r = 'L(x 1 x 2 )l s ]{Y,(xy)T.(x 2 2 )} 

Dividing both sides by (n - 1), we have 

r = {Z(Jfi - X x )(X 2 - X 2 )/(n - l)}/^ 2 (7.4.1) 

As n becomes large, X x and X 2 tend to coincide with p x and /t 2 , respec- 
tively, s x and s 2 tend to equal a x and cr 2 , and division by (n — 1) becomes 
equivalent to division by n. Hence, when applied to the whole population, 
equation 7.4.1 becomes 

p = {Average value of (X x - p x )(X 2 - p 2 )}/^i ^2 (7.4.2) 
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his process, drawing a hundred or more pairs, and then compute the cor- 
elation, you will get a value of r not greatly different from the population 
/alue, 

P = 3/V(4)(5) = 0.67 

rhe numerator of this fraction is the number of common elements, while 
;he denominator is the geometric mean of the total numbers of elements 
n the two sums, X t and X 2 . Thus, if n 12 represents the number of com- 
non elements, with n 1 t and n 12 designating the total numbers of elements 
making up the two sums, then the correlation between these two sums is, 
theoretically, 

P = «12/V"U»22 

Of course, there will be sampling variation in the values calculated from 
drawings. You may be lucky enough to get a good verification with only 
10 or 20 pairs of sums. With 50 pairs we have usually got a coefficient 
ivithin a few hundredths of the expected parameter, but once we got 0.28 
when the population was 

= 6/7(9X16) = 0.5 

If you put the same number of elements into X x and X 2 , then n 1 = n 2 . 
Denoting this common number of total elements by n, 

P = n 12 /n, 

the ratio of the number of common elements to the total number in each 
sum. In this special case, the correlation coefficient is simply the fraction 
of the elements which are common. Roughly, this is the interpretation of 
the sister-brother correlation in stature (table 7.1.1), usually not far from 
0.5 : an average of some 50% of the genes determining height is common to 
sister and brother. 

Another illustration of this special case arises from the determination 
of some physical or chemical constant by two alternative methods. Con- 
sider the estimation of the potassium content of the expressed sap of corn 
stems as measured by two methods, the colorimetric and the gravimetric. 
Two samples are taken from the same source, one being treated by each 
of the two techniques. The common element in the two results is the actual 
potassium content. Extraneous elements are differences that may exist 
between the potassium contents of the two samples that were drawn, and 
the errors of measurement of the two procedures. 

The concept of common elements has been presented because it may 
help you to a better understanding of correlation. But it is not intended 
as a method of interpreting the majority of the correlations that you will 
come across in your work, since it applies only in the type of special cir- 
cumstances that we have illustrated. 

When you have carried through some calculations of r with common 
elements, you are well aware of the sampling variation of this statistic. 
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Fig. 7.5.1 — Distribution of sample correlation coefficients in samples of S pairs drawn 
from two normally distributed bivariate populations having the indicated values of p. 

However, it would be too tedious to compute enough coefficients to gain 
a picture of the distribution curve. This has been done mathematically 
from theoretical considerations. In figure 7.5.1 are the curves for samples 
of 8 drawn from populations with correlations zero and 0.8. Even the 
former is not quite normal. The reason for the pronounced skewness of 
the latter is not hard to see. Since the parameter is 0.8, sample values 
can exceed this by no more than 0.2, but may be less than the parameter 
value by as much as ! .8. Whenever there is a limit to the variation of a 
statistic at one end of the scale, with practically none at the other, the dis- 
tribution curve is likely to be asymmetrical Of course, with increasing 
sample size this skewness tends to disappear. Samples of 400 pairs, drawn 
from a* population with a correlation even as great as 0.8, have little 
tendency to range more than 0.05 on either side of the parameter. Conse- 
quently, the upper limit, unity, would not constitute a restriction, and the 
distribution would be almost normal. 

EXAMPLE 7.5 1 — In a tea plantation (5), the production oi S 6 plots during one 14-week 
period was correlated with the production of the same plots in the following period of equal 
length. The correlation coefficient was 0 9 1 Can you interpret this m terms of common 
elements 9 

EXAMPLE 7 5 2 — To prove the result that with common elements, p~ n lz 
start from the result (7.4.3), which gives p = Cov. {X i X 2 )l^ l a 2 It X x is the sum ot n n inde- 
pendent drawings from a population with standard deviation <r, then <x , rr v « x * Similarly, 
<t 2 = <rjn 22 . To find Cov. {X x X 2 ) write X x » c + u u X 2 ~ < 4* u ly where c\ the common 
part, is the sum of the same set of n l2 drawings. Assuming that the drawings are from a 
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population with zero mean, X x and X 2 will have zero means. Thus, Cov. (X t X 2 ) = Average 
value of (. X x X 2 ) = Average value of ( c 4- u x )(c + u 2 ). But this is simply the average of c 2 , 
or m other words the variance of c, since the terms cu 2 , cu l and u x u 2 all have averages zero 
because c, and u 2 result from independent drawings. Finally, the variance of c is cr 2 w 12 , 
giving p = <r 1 n li /((T % Jn ll )ia^Jn 22 ) = n l2 /fn n n 22 . 

EXAMPLE 7.5.3 — Suppose that u l9 u 2 , u 3 are independent draws from the same 
population, and that X x — 3w x + u 2 , X 2 — 3u x -F w 3 . What is the correlation p between X x 
and X 2 ? Ans 0 9. More generally, if X r = fu x + u 2> X 2 ~ fu x + i/ 3 , then p =/ 2 /(/ 2 + 1). 
This result provides another method of producing pairs of correlated vanates. 

7.6 — Testing the null hypothesis p = 0. From the distribution of r 
whemp = 0, table A 1 1 gives the 5% and 1% significance levels of r. Note 
that the table is entered by the degrees of freedom, in this case n — 2. 
(This device was adopted because it enables the same table to be used in 
more complex problems.) As an illustration, consider the value r = 0.597 
which was obtained from a sample of size 9 in diagram C of figure 7.2.1. 
For 7 df. , the 5% value of r in table A 1 1 is 0.666. The observed r is not 
statistically significant, and the null hypothesis is not rejected. This ex- 
ample throws light on the difficulty of graphical evaluation of correlations, 
especially when the number of degrees of freedom is small — they may be 
no more than accidents of sampling. Since the distribution of r is sym- 
metrical when p = 0, the sign of r is ignored when making the test. 

Among the following correlations, observe how conclusions are 
affected by both sample size and the size of r: 


Number of 
Pairs 

Degrees of 
Freedom 

r 

Conclusion About 
Hypothesis, p = 0 

20 

18 

0.60 

Reject at 1% level 

100 

98 

0.21 

Reject at 5% level 

10 

8 

0.60 

Not rejected 

15 

13 

-0.50 

Not rejected 

500 

498 

-0 15 

Reject at 1% level 


You now know two methods for testing whether there is a linear rela- 
tion between the variables Y and X. The first is to test the regression 
coefficient b y . x by calculating t = b y . x /s h and reading the f-table with 
(n — 2) df The second is the test of r. Fisher (8) showed that the two 
tests are identical. In fact, the table for r can be computed from the 
/-table by means of the relation 

t - b rx /s b = rj(n - 2)/V(l - r 2 ), d.f. = n - 2 (7.6.1) 

(See example 7.6.1). To illustrate, we found that the 5% level of r for 
7 d.f. was 0.666. Let us compute 

t = (0.666)V7/V{1 - (0.666) 2 ] = 2.365 

I Reference to the /-table (p. 549) shows that this is the 5% level of t for 7 d.f. 
I In practice, use whichever test you prefer. 




ms 

This relation raises a subtle point. The /-test of b requires only that 
T be normally distributed: the values of X may be normal or they may be 
selected by the investigator. On the other hand, we have stressed that r 
and p are intimately connected with random samples from the bivariate 
normal distribution. Fisher proved, however, that in the particular case 
P = 0, the distribution of r is the same whether X is normal or not, pro- 
vided that Y is normal. 

EXAMPLE 7.6.1 — To prove relation (7.6.1) which connects the t-test of b with the 
test of r, you need three relations : (t) b rx * rs t !s x ,(ii)s„ = s,.JJXx 2 ,(m)s t . x 2 « (1 - P)Zy x 
l(n — 2), as shown in equation (7,3. 1), p. 1 76. Start with t = b r Js h and make these substitu- 
tions to establish the result. 

7.7 — Confidence limits and tests of hypotheses about p. The methods 
given in this section, which apply when p is not zero, require the assump- 
tion that the (X, Y) or (X x , X 2 ) pairs are a random sample from a bivariate 
normal distribution. 

Table A 1 1 or the /-table can be used only for testing the null hy- 
pothesis p = 0. They are unsuited for testing other null hypotheses, such 
as p — 0.5 for example,. or p x - p 2 , or for making confidence statements 
about p. When p ^ 0 the shape of the distribution of r changes, becoming 
skew, as was seen in figure 7.5.1. 

A solution of these problems was provided by Fisher (9) who de- 
vised a transformation from r to a quantity z, distributed almost normally 
with standard error 

1 

o, = * 

V(« - 3) 

“practically independent of the value of the correlation in the population 
from which the sample is drawn.” The relation of z to r is given by 

2 = ipo&d + r )- kg* U - r )] 

Table A 12 (r to z) and A 13 (z to r) enable us to change from one to the 
other with sufficient accuracy. Following are some examples of the use 
of z. 

1 . It is required to set confidence limits to the value of p in the popula- 
tion from which a sample r has been drawn. As an example, consider 
r = —0.889, based on 9 pairs of observations, figure 7.2. IF. From table 
A 12,z = 1.4 17 corresponds tor = 0.889. Since n = 9 ,a 2 ~ \/f6 =s 0.408. 
Since z is distributed almost normally, independent of sample size, 
z 0 oi = 2.576. For P = 0.99, we have as confidence limits for z, 

1.417 - (2.576)(0.408) <z< 1.417 + (2.576)(0.408), 

0.366 <z < 2.468 

Using table A 13 to find the corresponding r, and restoring the sign, the 
0.99 confidence limits for p are given by 

-0.986 <p < -0.350 
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Emphasis falls on two facts : (i) in small samples the estimate, r, is not 
very reliable ; and (ii) the limits are not equally spaced on either side of r, 
a consequence of its skewed distribution. 

2. Occasionally , there is reason to test the hypothesis that p has some 
particular value, other than zero , in the sampled population (p = 0, you re- 
call, is tested by use of table A 1 1). An example was given in section 7.5, 
where r = 0.28 was observed m a sample of 50 pairs from p = 0.5. What 
is the probability of a larger deviation? For r = 0.28, z = 0.288, and fox 
p = 0.5, z = 0.549. The difference, 0.549 — 0.288 = 0.261, has a standard 
error, l/yj(n — 3) = 1/^47 = 0.1459. Hence, the normal deviate is 0.261/ 
0.1459 = 1.80, which does not reach the 5% level: the sample is not as 
unusual as a l-m-20 chance. 

3. To test the hypothesis that two sample values of r are drawn at 
random from the same population , convert each to z, then test the signifi- 
cance of the difference between the two z’s. For two lots of pigs the cor- 
relations between gain in weight amount of feed eaten are recorded in 
table 7.7.1. The difference between the z-values, 0.700, has the mean square 



The test is completed in the usual manner, calculating the ratio of the dif- 
ference of the z’s to the standard error of this difference. With P = 0.37 
there is no reason to reject the hypothesis that the z’s are from the same 
population, and hence that the r’s are from a common population cor- 
relation. 

4. To test the hypothesis that several r’s are from the same p , and to 
combine them into an estimate of p. Several sample correlations may 
possibly be drawn from a common p. If this null hypothesis is not re- 
jected, we may Wish to combine the r’s into an estimate of p more reliable 
than that afforded by any of the separate r’s. Lush (14) was interested ipt 
an average of the correlations between initial weight and gam in 6 lots 
of steers. The computations are shown in table 7.7.2. Each z is weighted 
(multiplied) by the reciprocal of its mean square, so that small samples 


TABLE 7 7 1 

Test of Sig >• ficance of the Difference Between Two Correlations of Gain 
With Feed Eaten Among Swine 


Lot 

Pigs m Lot 

r z 


l/(« - 3) 

i 

5 

0 870 1 333 


0 500 

2 

12 

0 560 0 633 


0 111 



Difference = 0 700 

Sum 

= 0611 

i 

<7^ = 70 611 

= 0 782 0 700/0 782 = 0 895 

P = 0 37 





TABLE 7.7 2 

Test of Hypothesis of Common p and Estimation of p. Correlation Between 
Initial Weight and Gain of Steers 


Samples 

No. 

= n 

n -r- 3 

r 


Weighted z 
* (/i - 3 )z 

Weighted 
Square 
= («-3)z 2 

Cor- 

rected 

1927 Herefords 

4 

1 

0.929 

1.651 

1.651 

2.726 

1.589 

1927 Brahmans 

13 

10 

0 570 

0.648 

6.480 

4.199 

0,633 

1927 Backcrosses 

9 

6 

0.455 

0.491 

2.946 

1.446 

0,468 

1928 Herefords 

6 

3 

— 0.092 

-0.092 

-0.276 

0.025 

-0.055 

1928 Brahmans 

11 

8 

0123 

0.124 

0.992 

0.123 

0106 

1928 Backcrosses 

14 

11 

0 323 

0.335 

3.685 ; 

| 1.234 ; 

0 321 

57 

39 



1 15 478 ; 

9.753 

14 941 


Average z 

0.397 

, ...i 

6.145 

r - 0.383 

! 

Average r = 0,377 

j 

X 

2 » 3.608 

! r- 0.365 


have little weight. The sum of the weighted z’s, 15.478, is divided by the 
sum of the weights, 39, to get the average z w = 0.397. 

The next column contains the calculations that lead to the test of the 
hypothesis that the six sample correlations are drawn from a common 
population correlation. The test is based on a general result that if the k 
normal variates z, are all estimates of the same mean p, but have different 
variances <r, 2 , then 

Ew f (z f - z w ) 2 = 2w;Z( 2 - (Ew i Zi) 2 /2w ( 

4s distributed as y 2 with (k - 1 )d.f., where w t = l/o, 2 . In this application, 
W; = n, — 3 and 

X 2 = Z(n - 3)z 2 - [2(n - 3)z] 2 /Z(« - 3) 

= 9.753 -r (15.478) 2 /39 = 3.610, 

with 5 degrees of freedom. From table A 5, p. 550, P = 0.61, so that H 0 
is not rejected. 

Since the six sample correlations may all have been drawn from the 
same population, we compute an estimate of the common p. This is got 
by reading from table A 13 the correlation 0.377 corresponding to the 
average z w = 0.397. Don’t fail to note the great variation in these small 
sample correlations. The S.D. of ^ is 1/^39. 

Fisher pointed out that there is a small bias in z, each being too large 
by 
P 

2 (n - 1) 

The bias may usually be neglected. It might be serious if large numbers of 
correlations were averaged, because the bias accumulates, one bit being 



188 Chapter 7; Correlation 

added with every 2 . If there is need to increase accuracy in the calcula- 
tion of table 7.7.2, the average r = 0.377 may be substituted for p; then 
the approximate bias for each 2 may be deducted, and the calculation of 
the average 2 repeated. Since this will decrease the estimated r, it is well 
to guess p slightly less than the average r. For instance, it may be guessed 
that p = 0.37, then the correction in the first 2 is 0.37/2(4 — 1) = 0.062, 
and corrected 2 is 1.651 - 0.062 = 1.589. The other corrected 2 ’s are in 
the last column of the table. The sum of the products, 

Z(«— 3)(corrected 2 ) = 14.941, 

is divided by 39 to get the corrected mean value of 2 , 0.383. The cor- 
responding correlation is 0.365. 

For tables of the distribution of r when p^0, see reference (4), 

EXAMPLE 7.7.1— To get an idea of how the selection of pairs affects correlation, try 
picking the five lowest values of test II (example 7.4.8) together with the six highest. The 
correlation between these 1 1 scores and the corresponding scores on test I turns out to be 
0.89, as against r - 0.77 for the original sample. 

EXAMPLE 7.7.2 — Set 95% confidence limits to the correlation, 0.986, n = 533, be- 
tween live and dressed weights of swine. Ans. 0.983 - 0.988. 

What would have been the confidence limits if the number of swine had been 25? 
Ans. 0.968 - 0.994. 

EXAMPLE 7.7.3 — In four studies of the correlation between wing and tongue length 
in bees. Grout (10) found values ofr = 0.731, 0.354, 0.690, and 0.740, each based on a sample 
of 44. Test the hypothesis that these are samples from a common p. Ans. x 2 ~ 9.164, 
d.f. = 3, P * 0.03. In only about three trials per 100 would you expect such diagreement 
among four correlations drawn from a common population. One would like to know more 
about the discordant correlation, 0.354, before drawing conclusions. 

EXAMPLE 7.7.4 — Estimate p in the population from which the three bee correlations, 
0.731, 0.690, and 0.740, were drawn. Ans. 0.721. 

EXAMPLE 7.7.5 — Set 99% confidence limits on the foregoing bee correlation. Note: 
r = 0.721 is based on (n - 3) = 3 x 41 = 123. The value of z is therefore equivalent to a 
single z from a sample of 123 -t- 3 = 126 bees. The confidence limits are: 0.590 — 0.815. 

7.8 — Practical utility of correlation and regression. Over the last 
. forty years, investigators have tended to increase their use of regression 
techniques and decrease their use of correlation techniques. Several 
reasons can be suggested. The correlation coefficient r merely estimates 
the degree of closeness of linear relationship between Y and X , and the 
meaning of this concept is not easy to grasp. To ask whether the relation 
between Y and X is close or loose may be sufficient in an early stage of 
research. But more often the interesting questions are : How much does 
Y change for a given change in XI What is the shape of the curve con- 
necting Y and XI How accurately can Y be predicted from XI These 
questions are handled by regression techniques. 

Secondly, the standard results for the distribution of r as ; an estimate 
of a non-zero p require random sampling from a bivariate normal popula- 
tion. Selection of the values of X at which Y is measured, often done in- 



tentionally or because of operational restrictions, can distort the frequency 
distribution of r to a marked degree. 

The correlation between two variables may be due to their common 
relation to other variables. The organic correlations already mentioned 
are examples. A big animal tends to be big all over, so that two parts are 
correlated because of their participation in the general size. Over a period 
of years, many apparently unrelated variables rise or fall together within 
the same country or even in different countries. There is a correlation of 
- 0.98 between the annual birthrate in Great Britain, from 1875 to 1920, 
and the annual production of pig iron in the United States. The matter 
was discussed by Yule (19) as a question: Why do we sometimes get 
nonsense-correlations between time series? Social, economic, and tech- 
nological changes produce the time trends that lead to such examples. 

In some problems the correlation coefficient enters naturally and use- 
fully. Correlation has played an important part in biometrical genetics, 
because many of the consequences of Mendelian inheritance, and later 
developments from it, are expressed conveniently in terms of the correla- 
tion between related persons or animals. 

A second example occurs when we are trying to select person? with 
high values of some skill Y by means of examination results X. If Y and 
X follow the bivariate normal distribution, the average Y value, say Y', 
of candidates whose exam score is X is given by the equation 

{Y' — (1y)/(Ty = p(X — Px)/ a x 

Suppose we select the top P% in the exam. For the normal curve, the 
average value of ( X — p x )/ a x for the selected men may be shown to be 
H/P when there are many candidates, wh?re H is the ordinate of the 
normal curve at the point that separates the top P% from the remaining 
(1 — P)%. When P = 5%, the ordinate H = 0.1032, and HIP = 2.06. 
Thus the average Y value of the top 5% is 2.06p in standard units. If 
p — 0.5 this average is 1.03. From the normal tables we find that when 
H/P = 1.03, the corresponding P is 36%. This means that with p * 0.5, 
the 5% most successful performers in the exam have only the same average 
ability as the top 36% of the original candidates. The size of p is the key 
factor in determining how well we can select big!} values of Y by a screening 
process based on X. 

In hydrology, suppose that there are annual records Y of the flow 
of a stream for a relatively short period of m years, and records X of a 
neighboring stream for a longer period of n years. Instead of using Y m 
as the estimate of the long-term mean p r of Y, we might work out the 
regression of Y on X and predict p Y by the formula 

f l Y — Y m + h(X n — A m ) 

The proportional reduction in variance due to this technique, known 
as stream extension, is approximately 



190 Chapter 7: Correlation 

V(JJ - Vifir) Jn_-m) T _ (1 - p 2 ) 

V(Y m ) n [_ m — 3 _ 

Here again it is the value of p, along with the lengths of run available 
in the two streams, that determines whether this technique gives worth- 
while gains m precision. 

7.9 — Variances of sums and differences of correlated variables. When 
X r and A ' 2 are independent, a result used previously is that the variance of 
their sum is the sum of their variances. When they are correlated, the 
more general result is 


o-x,+x 2 2 = <H 2 + + 2pa x <r 2 (7.9.1) 

Positive correlation increases the variance of a sum, negative correlation 
decreases it. The corresponding sample result is 

+x 2 2 = Si 2 + S 2 2 + 2^2 (7.9.2) 

This identity is occasionally used as a check on the computation of s l9 s 2i 
and r from a sample. For each member of the sample, X x + X 2 is written 
down and the sample variance of this quantity is obtained in the usual way. 
For the difference D = X 1 - X 2 , the variance is 


o D 2 = ffi 2 + cr 2 2 — 2pa i a 1 (7.9.3) 

With differences, positive correlations decrease the variance. In paired 
experiments, the goal in pairing is to produce a positive correlation p 
between the members X u X 2 of a pair. The pairing does not affect the 
term (a t 2 + c 2 ) in (7.9.3), but brings in a negative term, 2po x a 2 . 

If we have k variates, with p tj the correlation between the ith and the 
yth variates, their sum S = X x + X 2 + . . . + X k has variance 


<t s 2 = ff x 2 + a 2 2 + ■ ■ ■ + o k + 2 p n a x a 2 + 2pi 3 <H0'3 + ••• 

+ 2Pk-i,k<tk-i<r k ( 7 . 9 . 4 ) 

where the cross-product terms 2p lJ -cr £ <r J extend over every pair of variates. 

EXAMPLE 7.9.1 — To prove formula (7.9.1), note that by the definition of a variance, 
the variance of X, + X 2 is the average value of (X, + X 2 ~ n x - /i 2 ) 2 , taken over the popula- 
tion. Write this as 

E{(X t - ju,) 4- (X 2 - /i 2 }} 2 = E(X, - ii,) 2 + E(X 2 - ii 2 ) 2 + 2 E(X, - p x )(X 2 - fx 2 ) 
where the symbol E (expected value) stands for “the average value of” This gives 

= a 2 + + 2p<T 1 a 2 , 

since by equation (7 4.2) (p 180), E{X, - n,)(X 2 - p 2 ) = pc i<r 2 Formulas (7.9.3) and 
(7.9.4) are proved m the same way. 



EXAMPLE 7 .9.2 — In a sample of 300 ears of com (7), the weight of the grain, u, tud a 
standard deviation s g = 24.62 gms.; the weight of the cob, C, had a standard deviation 
s c = 4. 1 9 gms. ; and r gc was 0.6906. Show that the total ear weight- W = G + C had = 27 7 
gms. and that r vg - 0.994. 

EXAMPLE 7.9.3 — In table 7.1.1, subtract each sister’s height from her brother’s, then 
compute the corrected sum of squares of the differences Verify by formula (7 9 3) that your 
result agrees with the values lx, 2 = 74, Xx 2 2 = 66, £x,x, ■= 39, given under table 7 1.1 

EXAMPLE 7.9.4 — If r l2 ~ 1, show that s 0 = q - s 2 , where s, > 

7.10 — The calculation of r in a large sample. When the sample is 
large, the variates X and Y are often grouped into classes, as illustrated 
in table 7.10.1 for a sample of 327 ears of corn (20). The diameters X 
are in millimeter classes and the weights Y in 1 0-gram classes. 1 he figures 
in the body of the table are the frequencies f xy in each X and Y class. 
Looking at the class with diameter 48 and weight 300, we see that there 
were = 3 ears in this class, i.e., with diameters between 47.5 and 48.5 
mm., and weights between 295 and 305 gms. Correlation in these data 
is evidenced by the tendency of high frequencies to lie along the diagonal 
of the table, leaving two comers blank — there are no very heavy ears with 
small diameters. 

The steps in the calculation are as follows : 

1. Add the frequencies in each row, giving the column of values^,, 
and in each column, giving the row of values^- 

2. Construct a convenient coding of the weights and diameters, writ- 
ing down the coded Y and X values. 

3. Write down a column of the values Yf y and a row of the values Xf r . 

4. The quantities YXf x , YYf y , lx 2 and Z y 2 are now found on the 
calculating machine in the usual way, and are entered in table 7.10.2 

5. The device for finding Z xy is new. In each row, multiply the 
f xy by the corresponding coded X, and add along the row. As examples’ 

(i) In the 4th row: (1)(2) + (1)(4) = 6 
(iii) In the 7th row: (l)(-2) + (3)(-l) + (7)(l) + (3)(3) + (3)(4)^23 

These are entered in the right-hand column, Z XJ xy Then form the 
sum of products of this column with the coded Y column, giving 
XXYf xy = 2.318. The correction term is subtracted as shown in table 
7.10.2 to give Yxy = 2,323.20. 

6. The value of r is now computed (table 7.10.2). No decoding is 
necessary for i. 

As partial checks, the f x and /, values both add to the sample si/e 
while the column Z,V/ M in step 5 adds to the value Z Xj x found in step * 

A large sample provide- a good opportunity for checking the as- 
sumptions required for the distribution of r. It. each number YXf X) in 
the right-hand column is divided by the corresponding we obtain the 
mean of A' in each array (weight class). 1 hese mav be plotted against F 
to see whether the regression of \ on ) appears linear Similarly, by 



TABLE 7 10 1 

Computation of Sample Correlation Coefficient in Two Way Frequency Table 
(Frequency of occurrence of ears of maize having each diameter and weight) 




TABLE 7 10 2 

Calculation of Correlation Coefficient in Table 7 101 


m 


IX/ = 37 

1.X 2 /. = 2,279 
(IX/) 1 /n = 4 19 

lx 2 = 2,274 81 


2T/,= -46 

X = 7,264 Y.XY/, =2,318 

(lYff/n = 6 47 (XX/KlY/y/n = -5 20 

X/ 2 = 7,257 53 Xtf = 2,32320 


s 


Xxy __ 2,323 20 

JiXx 2 )(l}’ 2 ) ~ ^027481)025753) 


05718 


extra calculation the values S and the F means may be obtained for 
each column and plotted against X A test for linearity of regression is 
given in section 15 4 The model also assumes that the variances of F in 
each column, s y 2 , are estimates of the same quantity, and similarly for 
the variances of s x 2 of X within each row Section 10 21 supplies a test 
of homogeneity of variances 

EXAMPLE 7 10 I— Using the data m columns f y and Y table 7 10 L calculate Zy z 
= 7 257 53 together with the sample mean and standard error 198 6 ± 2 61 

EXAMPLE 7 10 2 -Calculate the sample mean, 44 1, and standard deviation, 2 64 
in the 42-millimeter array of weights table 7 10 1 

EXAMPLE 7 10 3- In the 200-gram array of diameters compute X « 198 6 and 
s = 47 18 

EXAMPLE 7 10 4 Compute the sample regression coefficient of weight on diameter 
1 0213 together with the regression equation ? * 1 Q213A' + 154 81 

EXAMPLE 7 10 5— Calculate the mean diameter in each of the 28 weight arrays Plot 
these means against the weight class marks Does there seem to be any pronounced cum 
linearity in the regression of these mean diameters on the weight > Can you write the regrev 
sion equation giving estimated diameter for each weight y 

EXAMPLE 7 10 6 Calculate the sample mean weight of the ears m each of the 16 
diametei arrays of table 7 10 1 Present these means graphically as ordinates with the 
corresponding diameters as abscissas Plot the graph of the regression equation on the 
same figure Do you get a good fit 9 Is there any evidence of curviimeanty m the regression 
of means ? 

7.11 — Non-parametric methods. Rank correlation. Often, a bivariate 

population is far from normal In that event, the computation of r as 
an estimate of p is no longer valid In some cases a transformation of the 
\anables \\ and X 2 brings their joint distribution close to the bivariate 
normal, making it possible to estimate p m the new scale bailing this 
methods of expressing the amount of correlation m non-normal data bv 
means of a paiameter like p have not proceeded very far 

Nevertheless we ma\ still want to examine whether two variables are 
independent or whether thev vaiv in the same or in opposite directions 
Loi a test of the null hvpothesis that there is no correlation t ma\ be used 
pun ided that o/?< of tin \ tuahksi norm il V\ hen neither v amble scum 
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TABLE 7.11.1 

Ranking of Seven Rats by Two Observers of Their Condition After Three Weeks 

on a Deficient Diet 


Rat 

Number 

Ranking by 

Difference, 

d 

d 1 


Observer 1 

Observer 2 

1 

4 

4 

0 

0 


2 

1 

2 

-1 

1 


3 

6 

5 

1 

1 


4 

5 

6 

-1 

1 


5 

3 

1 

2 

4 


6 

2 

3 

-1 

1 


7 

7 

7 

0 

0 




M 

cl 

II 

o 

= 8 



r s = 1 - 


6 Xd 2 6x8 

n(n 2 — 1) ~ ~ 7(49 - 1) 


= 0 857 


normal, the best-known procedure is that in which X x and X 2 are both 
rankings. If two judges each rank 12 abstract paintings in order of at- 
tractiveness, we may wish to know whether there is an> degree of agree- 
ment among the rankings. Table 7.11.1 shows similar rankings of the 
concition of 7 rats after a period of deficient feeding. With data that 
are not initially ranked, the first step is to rank X t and X 2 separately. 

The rank correlation coefficient , due to Spearman (11) and uSuaFy 
denoted by r s , is the ordinary correlation coefficient r between the 
ranked values X x and X 2 . It can be calculated in the usual way as 
'£(x 1 x 2 )/y/(%x 1 2 )(Ex 2 2 ). An easier method of computing r is given by 
the formula 


« 6Xd 2 

whose calculation is explained in table 7. 1 1 . 1 . Like r, the rank correlation 
can range in samples from — 1 (complete discordance) to + 1 (complete 
concordance). 

For samples of 10 or fewer pairs, the significance levels of r s , worked 
out by Kendall (12), (13), are given m table 7.1 1.2. In the rankings of the 
rats, r s = 0.857 with 7 pairs. The correlation is significant at the 5% level 
but not at the 1%. For samples of more than 10 pairs, the null distribu- 
tion of r s is similar to that of r, and table A 1 1 is used for testing r s . Re- 
member that the degrees of freedom in table All are two less than the 
number of pairs (size of sample). 

Another measure of degree of concordance, closely related to r s , is 
Kendall’s x (12). To compute this, rearrange the two rankings so that 




TABLE 7 11.2 

Significance Levels of r s in Small Samples 


m 


Size of Sample 

5% Level 

1% Level 

4 or less 

none 

none 

5 

1 000 

none 

6 

0 886 

1.000 

7 

0 750 

0.893 

8 

0 714 

0 857 

9 

0.683 

0.833 

10 

0 64? 

0 794 

1 1 or more 

i Use table A 11 (p 557) 


one of them is in the order 1, 2, 3, . . . n. For table 7.11.1, putting ob- 
server 1 in this order, we have: 


Rat No 

2 

6 

5 

1 

Observer 1 

1 

2 

3 

4 

Observer 2 

2 

3 

1 

4 


Taking each rank given by observer 2 in turn, coi nt how many of the 
ranks to the right of it are smaller than it, and idd these counts For the 
rank 2 given to rat No. 2 the count is 1, since only rat 5 has a smaller 
rank. The six counts are 1, 1, 0, 0, 1, 0, there being no need to count the 
extreme right rank. The total is Q * 3. Kendall’s t is 


4 Q __ t _ 12 ^ 
n(n — 1) 42 


* - 0.714 


Like r s , r lies between + 1 (complete concordance) and - 1 (complete 
disagreement). It takes a little longer to compute, but its frequency dis- 
tribution on the null hypotheses is simpler and it can be extended to study 
partial correlation For details, see (12). 

The quantities r, and t can be used as a measure of ability to appraise 
or detect something by ranking. For instance, a group of subjects might 
each be given bottles containing four different strengths of a delicate per- 
fume and asked to place the bottles in order of the concentration of per- 
fume If X, represents the correct ranking of the strengths and X 2 a 
subject’s ranking, the value of r, or x for this subject measures, although 
rather crudely, his success at this task From the results for a sample of 
men and women we could investigate whethei women are better at this 
task than men The difference between ton, lor women and men could 
be computed. appiosimateK b\ an ouhnaiy /-test 

7.12— The comparison of two correlated variances. In section 4 15 
(p. 1 16) we showed how to test the null hypothesis that two independent 
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estimates of variance, s x 2 and s 2 2 , are each estimates of the same unknown 
population variance a 2 . The procedure was to calculate F=s x 2 /s 2 2 , 
where s x 2 is the larger of the two, and refer to table 4.15.1 or table A 14. 

This problem arises also when the two estimates s 2 and s 2 2 are cor- 
related. For instance, in the sample of pairs of brothers and sisters 
(section 7.1.), we might wish to test whether brother heights, X u are 
more or less variable than sister heights, X 2 . We can calculate s x 2 and 
s 2 2 , the variances of the two heights between families. But in our sample 
of 1 1 families the correlation between X x and X 2 was found to be r = 0.558. 
Although this did not reach the 5% level of r (0.602 for 9 d.j\\ the presence 
of a correlation was confirmed by Pearson and Lee’s value of r = 0.553 
for the sample of 1,401 families from which our data were drawn. In 
another application, a specimen may be sent to two laboratories that 
make estimates X u X 2 of the concentration of a rare element contained 
in it. If a number of specimens are sent, we might wish to examine whether 
one laboratory gives more variability in results than the other. 

The test to be described is valid for a sample of pairs of values X u X 2 
that follows a bivariate normal. It holds for any value p of the population 
correlation between X x and X 2 . If you are confident that p is zero, the 
ordinary F-test should be used, since it is slightly more powerful. When 
p Is not zero, the F-test is invalid. 

The test is derived by an ingenious approach due to Pitman (15). 
Suppose that X x and X 2 have variances a 2 and g 2 and correlation p. 
The null hypothesis states that a 2 = g 2 \ for the moment, we are not 
assuming that the null hypothesis is necessarily true. Since X x and X 2 
follow a bivariate normal, it is known that D = X x — X 2 and S = X x -f X 2 
also follow a bivariate normal. Let us calculate the correlation p DS be- 
tween D and S. From section 7.9, 

a D 2 = <t ! 2 + g 2 2 - 2 pa x a 2 
a s 2 = g 2 + g 2 + 2pG x G 2 
Cov. (DS) = Cov. (X x - X 2 )(X x + X 2 ) = (? x 2 - ^ 2 2 

since the two terms in Cov. (X x X 2 ) cancel. Hence 

Pds = (^i 2 “ i 2 + cr 2 2 ) 2 “ 4 P 2 ^ lW} 

If 4> = g \ 2 /g 2 2 is the variance-ratio of g 2 to a 2 , this may be written 

Pds = (0 - !)/>/{«> + l) 2 ~ 4p 2 (/>} (7.12.1) 

Under the null hypothesis, </> = !, so that p DS = 0. If g 2 > g 2 , then 
0 > 1 and p DS is positive, while if g x 2 < g 2 2 , p DS is negative. 

Thus, the null hypothesis can be tested by finding D and S for each 
pair, computing the sample correlation coefficient r DS , and referring to 
table All. A significantly positive value of r DS indicates g 2 > <j 2 2 , 
while a significantly negative one indicates g x 2 < a 2 . 

Alternatively, by the same method that led to equation (7.12.1), r DS 
can be computed as 
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r D s = (F- 1 )/f{(F + l} 2 - 4r 2 F}, (7.12.2) 

where F = s 2 2 /s 2 2 and r is the correlation between X 2 and X 2 . 

In a sample of 173 boys, aged 13-14, height had a standard deviation 
Si = 5.299, while leg length gave s 2 = 4.766, both figures being expressed 
as percentages of the sample means (16). The correlation between height 
and length was r = 0.878, a high value, as would be expected. To test 
whether height is relatively more variable than leg length, we have 

F = (5.299/4.766) 2 = 1.237 

and from equation (7.12.2), 

r DS = (0.237)/V{(2.237) 2 - 4(0.878) 2 ( 1.237)} * 0.237/1.136 - 0.209 

with d.f. = 173 — 2 = 171. This value of r DS is significant at the 1% level, 
since table A 1 1 gives the 1% level as 0.208 for 1 50 d.f. 

The above test is two-tailed : for a one-tailed test, use the 10% and 2% 
levels in table All. 

This approach also provides confidence limits for from a knowledge 
of F and r. The variates D’ — {Xfa 2 - X 2 fa 2 ) and S' * {Xfa v + X 2 /a 2 ) 
are uncorrelated whether <t 1 equals a 2 or not. The sample correlation 
coefficient between these variates, say R. therefore follows the usual dis- 
tribution of a sample correlation when p = 0. As a generalization of 
formula 7.12.2, the value of R may be shown to be 

R = (F - <p)/f{(F + <f>) 2 - 4 r 2 F<p} 

In applying this result, it is easier to use the t-table than that of r. The 
value of t is 


t = (F 2/V{(l - r 2 )Fd>} (7.123) 

If <f> is much smaller than F, t becomes large and positive : if ^ is much larger 
than F, t becomes large and negative. Values of <f> that make t lie between 
the limits ±/ 0 .os form a 95% confidence interval. The limits found by 
solving (7.12.3) for (j> are computed as 

$ — F{K ± J(K 2 - 1 )}, 

where 


K = 1 + 


2(1 - r 2 )t 0 , 05 2 
(« - 2 ) 


d.f. for r 0 .os -n-2 
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★ CHAPTER EIGHT 


Q 

^/ampling from the binomial 

distribution 


8.1 — Introduction. In chapter 1 the sampling of attribute was used 
to introduce some common statistical terms and techmques^-estmmtors, 
confidence intervals, the binomial distribution, tests of significance, and 
the chi-square test as applied to a simple proportion We return to the 
sampling of attributes m order to fill in the mathematical background of 
these techniques The binomial distribution and its relation to the normal 
distribution will be examined more thoroughly Further, just as you 
learned how to compare the means of two normal samples, independent 
or paired, we shall study the comparison of two proportions from inde- 
pendent samples and from paired samples 

Suppose that an attribute is possessed by a proportion p of the mem- 
bers of a population A random sample ot size n is drawn The binomial 
distribution gives a formula for the probability that the sample contains 
exactly r members having the attribute The formula is derived from some 
rules in the theory of probability, now to be explained 

8.2 — Some simple rales of probability. The study of probability 
began around three hundred years ago. At that time, gambling and game 
of chance had become a fashionable pastime, and there was much mteicM 
in questions about the chance that a certain type of card would be drawn 
from a pack or that a die would fall m a certain way 

In a pioblem in probability, we are dealing with a trial about to 
be made, that can have a number of different outcomes A six-sided die, 
when thrown, may show any of the numbers 1, 2, 3, 4, 5, 6 face upwards 
these are the outcomes Simpler problems in probability can ofien be 
solved by writing down all the different possible outcomes of the trial 
and recognizing that these are equally hkeh Suppose that the letters 
a h c, cl , c\ /, g are written on identical balls which are placed in a bag and 
mixed thoroughly One ball is drawn out blindly Most people would 
say without hesitation that the probability that an a is drawn is I 7 
because there are 7 balls, one of them is certain to be drawn and all are 
equally likely In general terms this result may be stated as follows 

199 
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Rule 1. If a trial has k equally likely outcomes, of which one and 
only one will happen, the probability of any individual outcome is l/k. 

The claim that the outcomes are equally likely must be justified by 
knowledge of the exact nature of the trial. For instance, dice to be used 
in gambling for stakes are manufactured with care to ensure that they are 
cubes of even density. They are discarded by gambling establishments 
after a period of use, in case the wear, though not detectable by the naked 
eye, has made the six outcomes no longer equally likely. The statement 
that the probability is 1/52 of drawing the ace of spades from an ordinary 
pack of cards assumes a thorough shuffling that is difficult to attain, par- 
ticularly when the cards are at all worn. 

In some problems the event in which we are interested will happen if 
any one of a specific group of outcomes turns up when the trial is made. 
With the letters a , b , c, d, e,f 9 g, suppose we ask “what is the probability 
of drawing a vowel?” The event is now “A vowel is drawn.” This will 
happen if either an a or an e is the outcome. Most people would say that 
the probability is 2/7, because there are 2 vowels present out of seven 
competing letters, and each letter is equally likely. Similarly, the prob- 
ability that the letter drawn is one of the first four letters if 4/7. These 
results are an application of a second rule of probability. 

Rule 2. ( The Addition Rule). If an event is satisfied by any one of a 
group of mutually exclusive outcomes, the probability of the event is the 
sum of the probabilities of the outcomes in the group. 

In mathematical terminology, this rule is sometimes stated as: 

P{E) = P(O x or 0 2 or . . . or OJ = P(O x ) 4- P(0 2 ) 4- ... 4- P(O m \ 
where P(O t ) denotes the probability of the zth outcome. 

Rule 2 contains one condition: the outcomes in" the group must be 
mutually exclusive. This phrase means that if any one of the outcomes 
happens, all the others fail to happen. The outcomes “a is drawn” and 
“e is drawn” are mutually exclusive. But the outcomes “a vowel is drawn” 
and “one of the first four letters is drawn” are not mutually exclusive, 
because if a vowel is drawn, it might be an a , in which case the event “one 
of the first four letters is drawn” has also happened. 

The condition of mutual exclusiveness is essential. If it does not 
hold, Rule 2 gives the wrong answer. To illustrate, consider the prob- 
ability that the letter drawn is either one of the first four letters or is a 
vowel. Of the seven original outcomes, a , b, c, d , e,f, g, five satisfy the 
event in question, namely a , b , c, d, e. The probability is given correctly 
by Rule 2 as 5/7, because these five outcomes are mutually exclusive. But 
we might try to shortcut the solution by saying “The probability that one 
of the first four letters is drawn is 4/7 and the probability that a vowel is 
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drawn is 2/7. Therefore, by Rule 2, the probability that one or the other 
of these happens is 6/7.” This, you will note, is the wrong answer. 

In leading up to the binomial distribution we have to consider the 
results of repeated drawings from a population. The successive trials or 
drawings are assumed independent of one another. This term means that 
the outcome of a trial does not depend in any way on what happens in 
the other trials. 

With a series of trials the easier problems can again be solved by 
Rules 1 and 2. For example, a bag contains the letters a , b , c, In trial 1 a 
ball is drawn after thorough mixing. The ball is replaced, and in trial 2 
a ball is again drawn after thorough mixing. What is the probability that 
both balls are a ? First, we list all possible outcomes of the two trials. 
These are (< a , a\ ( a , b\ (a, c\ (, b , a ), (b, b), (h, c), (c, a), (c\ h\ (c, c), where 
the first letter in a pair is the result of trial 1 and the second that of trial 2. 
Then we claim that these nine outcomes of the pair of trials are equally 
likely. Challenged to support this claim, we might say: (i) a, b 9 and c 
are equally likely at the first draw, because of the thorough mixing, and, 
(ii), at the second draw, the conditions of thorough mixing and of inde- 
pendence make all nine outcomes equally likely. The probability of (a, a) 
is therefore 1/9. 

Similarly, suppose we are asked the probability that the two drawings 
contain no c’s. This event is satisfied by four mutually exclusive out- 
comes: ( a , a), (, a , b), ( b , a\ and (A, b). Consequently, the probability 
(by Rule 2) is 4/9. 

Both the previous results can be obtained more quickly by noticing 
that the probability of the combined event is the product of the prob- 
abilities of the desired events in the individual trials. In the first problem 
the probability of an a is 1 /3 in the first trial and also 1/3 in the second trial 
The probability that both events happen is 1/3 x 1/3 = 1/9. In the second 
problem, the probability of not drawing a c is 2/3 in each individual trial. 
The probability of the combined event (no c at either trial) is 2/3 x 2/3 
= 4/9. A little reflection will show that the numerator of this product 
(1 or.4) is the number of equally likely outcomes of the two drawings that 
satisfy the desired combined event. The denominator, 9, is the total 
number of equally likely outcomes in the combined trials. The prob- 
abilities need not be equal at the two drawings. For example, the probabil- 
ity of getting an a at the first trial but not at the second is 1/3 x 2/3 « 2/9, 
the outcomes that produce this event being (a, b) and {a, c). 

Rule 5. (The Multiplication Rule). In a series of independent trials, 
the probability that each of a specified series of events happens is the 
product of the probabilities of the individual events. 

In mathematical terms, 

P(E X and E 2 . . . and£J - P(E X )P(E 2 ) . . . P(E m ) 
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In practice, the assumption that trials are independent, like the as- 
sumption that outcomes are equally likely, must be justified by knowledge 
of the circumstances of the trials. In complex probability problems there 
have been disputes about the validity of these assumptions in particular 
applications, and some interesting historical errors have occurred. 

This account of probability provides only the minimum background 
needed for working out the binomial distribution. Reference (1) is recom- 
mended as a more thorough introduction to this important subject at an 
elementary mathematical level. 

EXAMPLE 8.2.1 — A bag contains the letters A , b, c, D , e,f. \ G, h , I. If each letter is 
equally likely to be drawn, what is the probability of drawing : (i) a capital letter, (li) a vowel, 
(iii) either a capital or a vowel. Ans. (i) 4/9, (ii) 1/3, (iii) 5/9. Does Rule 2 apply to the two 
events mentioned in (iii)? 

EXAMPLE 8.2.2 — Three bags contain, respectively, the letters a, b; c, d, e;f, g , h , i. 
A letter is drawn independently from each bag. Write down all 24 equally likely outcomes of 
the three drawings. Show that six of them give a consonant from each bag. Verify that 
Rule 3 gives the correct probability of drawing a consonant from each bag (1/4). 

EXAMPLE 8.2.3 — Two six-sided dice are thrown independently. Find the probability : 
(i) that the first die gives a 6 and the second at least a 3, (ii) that one die gives a 6 and the 
other at least a 3, (iii) that both give at least a 3, (iv) that the sum of the two scores is not 
more than 5. Ans. (i) 1/9, (ii) 2/9, (iii) 4/9, (iv) 5/18. 

EXAMPLE 8.2.4 — From a bag with the letters a , b, c, d, e a letter is drawn and laid 
aside, then a second is drawn. By writing down all equally likely pairs of outcomes, show 
that the probability that both letters are vowels is 1/10. This is a problem to which Rule 3 
does not apply. Why not? 

EXAMPLE 8.2.5 — If two trials are not independent, the probability that event E 1 
happens at the first trial and E 2 at the second is obtained (1) by a generalization of Rule 3: 
P{E X and E 2 ) = P(E x )P(E 2 , given that E x has happened). This last factor is called the condi- 
tional probability of E 2 given E x , and is usually written P{E 2 \E x ). Show that this rule gives 
the answer, 1/10, in example 8.2.4, where E x , E 2 are the probabilities of drawing a vowel at 
the first and second trials, respectively. 

In many applications, the probability of a particular outcome must 
be determined by a statistical study. For instance, insurance companies 
are interested in the probability that a man aged sixty will live for the next 
ten years. This quantity is calculated from national statistics of the age 
distribution of males and of the age distribution of deaths of males, and 
is published in actuarial tables. Provided that the conditions of inde- 
pendence and of mutually exclusive outcomes hold where necessary, Rules 
2 and 3 are applied to probabilities of this type also. Thus, the probability 
that three men aged sixty, selected at random from a population, will all 
survive for ten years would be taken as/? 3 , where p is the probability that 
an individual sixty-year-old man will survive for ten years. 

8.3 — The binomial distribution. A proportion p of the members of a 
population possess some attribute. A sample of size n — 2 is drawn. The 
result of a trial is denoted by S (success) if the member drawn has the 
attribute arid by F (failure) if it does not. In a single drawing, p is the 
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(1)’ 

Outcomes of Trial 

1 2 

(2) 

Probability 

(3) 

No. of 
Successes 

(4) 

Probability 

F 

F 

qq 

0 

q 2 

F 

S 

qp > 





nr, r 

1 

2« 

S 

F 

pq j 



S 

S 

pp 

2 

P 2 

Total 


i 

1 


probability of obtaining an S, while q = 1 — p is the probability of ob- 
taining an F. Table 8.3.1 shows the four mutually exclusive outcomes of 
the two drawings, in terms of successes and failures. 

The probabilities given in column (2) are obtained by applying 
Rule 3 to the two trials. For example, the probability of two successive 
F’s is qq, or q 2 . This assumes, of course, that the two trials are inde- 
pendent, as is necessary if the binomial distribution is to hold. Coming 
to the third column, we are now counting the number of successes. Since 
the two middle outcomes, FS and SF, both give l success, the probability 
of 1 success is 2 pq by Rule 2. The third and fourth columns present the 
binomial distribution for n = 2. Asa check, the probabilities in columns 
2 and 4 each add to unity, since 




q 2 + 2 pq + p 2 

= {q + P? = 

=o) 2 « i 




TABLE 8.3.2 



, 


The Binomial Distribution for n - 3 



U) 


(2) 

(3) 

(4) 

Outcomes of Trial 


No. of 


1 

2 

3 

Probability 

Successes 

Probability 

F 

F 

f 

qqq 

0 

* ’ V 

F 

F 

S 

qqp '| 



F 

S 

F 

m > 

i 1 

w 

S 

F 

F 

pqq j 

i 

i 


F 

S 

S 

<!PP) 

1 

i 

W 

S 

F 

S 

pqp V 

2 

S 

S 

b 

ppq j 

i 


S 

S 

S 

ppp 

1 3 

p' 
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In the same way, table 8.3.2 lists the eight relevant outcomes for n = 3. 
The probabilities in the second and fourth columns are obtained by Rules 
3 and 2 as before. Three outcomes provide 1 success, with total prob- 
ability 3 pq 2 , while three provide 2 successes with total probability 3 p 2 q. 
Check that the eight outcomes in the first column are mutually exclusive. 

The general structure of the binomial formula is now apparent. The 
formula for the probability of r successes in n trials has two parts. One 
part is the term p r q"~ r - This follows from Rule 3, since any outcome of 
this type must have r S' s and (n — r ) F’s in the set of n draws. The 
other part is the number of mutually exclusive ways in which the r S’s 
and the (n — r ) F’s can be arranged. In algebra this term is called the 
number of combinations of r letters out of n letters. It is denoted by the 
symbol ("). The formula is 

/ n\ _ n(n — l)(n — 2) . . . (n — r + 1) 

w" r(r-l)(r-2)...(2)(l) 


For small samples these quantities, the binomial coefficients, can be 
written down by an old device known as Pascal's triangle, shown in table 
8.3.3. 

Each coefficient is the sum of the two just above it to the right and 
the left. Thus, for n — 8, the number 56 = 21 + 35. Note that for any 
n the coefficients are symmetrical, rising to a peak in the middle. 

Putting the two parts together, the probability of r successes in a 
sample of size n is 



»(» - l)(n - 2) . . . (n - r +1) 

Hr ~ l)(r - 2) . . . (2)(1) Pq 


These probabilities are the successive terms in the expansion of the bi- 
nomial expression (q + p ) n . This fact explains why the distribution is 
called binomial, and also verifies that the sum of the probabilities is 1, 
since (q + p) n = (l) w = 1. 


TABLE 8,3.3 

Binomial Coefficients Given by Pascal’s Triangle 


Size of Sample j 



Binomial Coefficients 



n 





1 





1 




1 


1 




2 



1 


2 


i 



3 


1 


3 


3 


1 


4 

1 


4* 


6 


4 


1 

5 

1 

5 


10 


10 


5 

1 

6 

1 6 


15 


20 


15 


6 1 

7 

1 7 

21 


35 


35 


21 

7 1 

8 

1 8 28 


56 


70 


56 


28 8 1 






etc. 
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Number of Successes 


Fig. 8.3.1 — Binomial distributions for n = 8. Top. p - 02, 
Middle: p = 0.5, Bottom: p =* 0.9. 


For n — 8, figure 8.3. 1 shows these distributions for p = 0.2, 0.5, and 
0.9. The distribution is positively skew for p less than 0.5 and negatively 
skew for p greater than 0.5. For p = 0.5 the general shape, despite the 
discreteness, bears some resemblance to a normal distribution. 

Reference (16) contains extensive tables of individual and cumulative 
terms of the binomial distribution for n up to 49 : reference ( 1 7) has cumu- 
lative terms up to n = 1 ,000. 

8.4 — Sampling the binomial distribution. As usual, you will find it 

instructive to verify the preceding theory by sampling. The table of 
random digits (table A 1, p. 543) is very convenient for drawing samples 
from the binomial with n = 5, since the digits in a row are arranged in 
groups of 5. For instance, to sample the binomial with p = 0.2, let the 
digits 0 and 1 represent a success, and all other digits a failure. By record- 
ing the total number of 0’s and l’s in each group of 5, many samples 
from n = 5, p - 0.2 can be drawn quickly. Table 8.4.1 shows the results 
of 100 drawings of this type, and illustrates a common method of tallying 
the results. A slanting line is used at every fifth tally, so that | | j ] repre- 
sents 5 drawings of a particular number of successes. 

To fit the corresponding theoretical distribution, first calculate the 
terms pY~ r . For r = 0 (no successes) this is cf = (0.8) 5 = 0.32768. For 
r - 1 , it is pcf~ 1 = (0.2)(0.8) 4 To obtain a shortcut, notice that this term 
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TABLE 8 4.1 

Tallying of 100 Drawings From the Binomial With n = 5, p *= 0.2 


No. of 
Successes 


Total 

0 

un um um urn um um 1 1 

32 

1 

uhi mi um um um um um uhi mi 

44 

2 

un um um 1 1 

17 * 

3 

urn i 

6 

4 

i 

1 

5 


0 


100 


can be written: (<f)(p/q)» It is computed from the previous term by 
multiplying by p/q = 0.2/0.8 = 1/4. Thus for r = 1 the term is 
(0.32768)/4 = 0.08192. Similarly, the term for r = 2, p 2 q n ~ 2 , is found by 
multiplying the term for r =■ 1 by (p/g), and so on for each successive term. 

The details appear in table 8.4.2. The binomial coefficients are read 
from Pascal’s triangle. These coefficients and the terms in p r q n ~ r are 
multiplied to give the theoretical probabilities of 0, 1, 2, ... 5 successes. 
Finally, since N = 100 samples were drawn, we multiply each probability 
by 100 to give the expected frequencies of 0, 1, 2, ... 5 successes. 


TABLE 8.4.2 

Fitting the Theoretical Binomial for n = 5, p = 0.2 


No of 
Successes (r) 

Term 

?<T T 

Binomial 

Coefficient 

(;W _r 

Expected 

Frequency 

Observed 

Frequency 

0 

0 32768 

i 

0.32768 

32.77 

32 

1 

0.08192 

5 

0 40960 

40.96 

44 

2 

0 02048 

10 

0 20480 

20.48 

17 

3 

0 00512 

10 

0 05120 

5.12 

6 

4 

0 00128 

5 

0 00640 

0.64 

i 

5 

0 00032 

1 

0 00032 

0 03 

0 


1.00000 

100.00 

100 


Because of sampling variation, the expected and observed frequencies 
do not agree exactly, but their closeness is reassuring. Later (section 9.4) 
a method is given for testing whether the observed and expected fre- 
quencies differ by no more than is usual from sampling variation. In the 
present example, the agreement is m fact better than is usually found in 
such sampling experiments (example 9.4.1). 

EXAMPLE 8 4 1 — With n = 2, p — 1/2, show that the probability of one success is 
1/2. If p differs from 1/2, does the probability of one success increase or decrease 9 
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EXAMPLE 8.4.2 — A railway company claims that 95% of its trams amve on time. If a 
man travels on three of these trains, what is the probability that, (i) all three arrive on time, 
(n) one of the three is late, assuming that the claim is correct. Ans (i) 0 857, (ii) 0 1 35. 

EXAMPLE 8.4.3 — Assuming that the probability that a child is male is 1/2, find the 
probability that m a family of 6 children there are: (i) no boys, (u) exactly 3 boys, (in) at 
least 2 girls, (iv) at least one girl and 1 boy. Ans. (i) 1/64, (u) 5/16, (ill) 57/64, (iv) 31/32, 

EXAMPLE 8.4.4 — Work out the terms of the binomial distribution for n * 4, p * 0,4. 
Verify that: ( 1 ) the sum of the terms is unity, (n) 1 and 2 successes are equally probable, 
(m) 0 successes is about five times as probable as 4 successes 

EXAMPLE 8 4.5 — By extending Pascal’s tnangle, obtain the binomial coefficients for 
n - 10. Hence compute and graph the binomial distribution torn * 10,/? » 1/2. Does the 
shape appear similar to a normal distribution? Hint: when p « 1/2, the term p T <f' r » 1/2* 
for any r. Since 2 10 = 1,024= 1,000, the distribution is given accurately enough for 
graphing by simply dividing the binomial coefficients by 1 ,000. 


8.5 — Mean and standard deviation of the binomial distribution. If 


fr = 


n(n — 1) . . . (n — r + 1) 
r(r - 1) . 7 . (2)(1) 


fq”~ r 


denotes the binomial probability of r successes in a sample of size n, the 
mean and variance of the distribution of the number of successes r are 
defined by the equations 


ft = Z r fr . = X ( r - vffr l 8 - 5 - 1 ) 

r-0 r = 0 

Note the formula for a 2 . In a theoretical distribution, a 2 is the average 
value of the squared deviation from the population mean. Each squared 
deviation, (r - p) 2 , is multiplied by its relative frequency of occurrence/,.. 
The concept of number of degrees of freedom does not come in. 

By algebra, it is found from (8.5.1) that 

p-np : a 2 = npq : a = , Jnpq (8.5.2) 

These results apply to the number of successes. Often, interest centers m 
the proportion of successes, r/n. For this, 

p-p : a 2 - pq/n : a‘= sfpqfn (8.5.3) 

Sometimes results are presented in terms of the percentage of successes 
lOOr/n. Formulas (8.5.3) also hold for the percentage of successes if p now 
stands for the percentage in the population and q — 100 — p. 

As illustrations, the formulas work out as follows for n = 64, p ~ 0.2 : 

Number: p = (64)(0.2) = 12.8 • o — v ''[(64)(0.2)(0.8)} = y 10.24 = 3.2 

Proportion: p — 0.2 . c = N |(02)(0 8)/64} = /'0.0025 = 0.05 

Percentage: p = 20 : cr = v J (20) ( 80) Jl 64 } = y'25 = 5 
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For a sample of fixe d size n, the standard deviations sjnpq for the 
number of successes and yjpq/n for the proportion of successes are greatest 
when p = 1/2. As p moves towards either 0 or 1, the standard deviation 
declines, though quite slowly at first, as the following table of ^fpq shows. 


p 

05 

0.4 or 0.6 

0.6 or 0.7 

0.2 or 0.8 

0 1 or 0.9 


0.500 

0.490 

0.458 

0.400 

0.300 


EXAMPLE 8.5 1 — For the binomial distribution of the number of successes with n — 2 
(given in table 8.3.1, p. 203), verify from formulas 8.5.1 that p = 2 p, c 2 = 2 pq. 

EXAMPLE 8 5.2 — For the binomial distribution with n = 5, p = 0.2, given in table 
8.4.2, compute Xr/ r and £(r — p) 2 f r and verify that the results are p = 1 and a 2 — 0.80. 

EXAMPLE 8,5 3— For n - 96, p = 0.4, calculate the S.D : s of: (i) the number, (n) the 
percentage of successes Ans. (i) 4.8, (u) 5. 

EXAMPLE 8.5.4 — An investigator intends to estimate, by random sampling from a 
large file of house records, the percentage of houses in a town that have been sold in the 
last year. He thinks that p is about 10% and would like the standard deviation of his esti- 
mated percentage to be about 1%. How large should n be 9 Ans. 900 houses 

There is an easy way of obtaining the results \i = p and <r 2 = pq/n for 
the distribution of the proportion of successes r/rt in a sample of size n. 
Attach the number 1 to every success in the population and the number 0 
to every failure. Instead of thinking of the population as a large collection 
of the letters S and i\ we think of it as a large collection of 1 ’s and 0’s. 
It is the population distribution of a variable X that takes only two values : 
1 with relative frequency p and 0 with relative frequency q. The popula- 
tion mean and variance of the new variate X are easily found by working 
out the definitions (8.5.1), 

Px ~ a x = — p) 2 fx 

where the sum extends only over the two values X = 0 and X = 1 , as 
shown below' : 


X 

fx 

Xfx 

X-n 

(X-p) 2 

{X-P) 2 f x 

0 

<1 

0 

-p 

P 2 

p 2 q 

1 

p 

p 

1 -p 

Q 2 

q 2 p 






il 

"CS 


The variate X has population mean p and population variance pq. 

Now draw a random sample of size n . If the sample contains r suc- 
cesses, then taken over the sample, is r, so that X = 'LX/n is r/n , the 
sample proportion of successes. But we know that the mean of a random 
sample from any distribution is an unbiased estimate of the population 
mean, and has variance <r 2 /n (section 2.11). Hence X = r/n is an unbiased 
estimate of p, with variance a\/n — pq/n. 
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Fig 8.6.1 —The solid vertical lines show the binomial distribution of the number of suc- 
cesses for 10, p ~ 0.5. The curve is the normal approximation to this distribution, 
which has mean np - 5 and S.D yjlnpq ) *= 1.581. 


Further, since X = r/n is the mean cf a sample from a population 
that has a finite variance pq , we can quote the Central Limit Theorem 
(section 2. 1 2). This states that the mean X of a random sample from any 
population with finite variance tends to normality. Hence, as n increases, 
the binomial distribution of r/n or of r approaches the normal distribution. 

For p = 0.5 the normal is a good approximation when n is as low as 
10. As p approaches 0 or 1, some skewness remains in the binomial 
distribution until n is large. 

8.6 — The normal approximation and the correction for continuity. The 
solid vertical lines m figure 8.6.1 show the binomial distribution of r for 
n = 10, p = 0.5. Also shown is the approximating normal curve, with 
mean np = 5 and S.D. ^Jnpq = 1.581. The normal seems a good ap- 
proximation to the shape of the binomial. 

One difference, however, is that the binomial is discrete, having 
probability only at the values r = 0, 1 , 2, ... 10, while the normal has 
probability in any interval from ~oo to oo. This raises a problem: 
in estimating the binomial probability of, say, 4 successes, what part of 
the normal curve do we use as an approximation? We need to set up a 
correspondence between the set of binomial ordinates and the areas under 
the normal curve. 

The simplest way of doing this is to regard the binomial as a grouping 
of the normal into unit class intervals. Under this rule the binomial 
ordinate at 4 corresponds to the area under the normal curve from 3| 
to 4j. The ordinate at 5 corresponds to the area from 4j to 5j, and so 
on. The ordinate at 10 corresponds to the normal area from 9J to jo. 
These class boundaries are the dotted lines in figure 8.6. 1. 

In the commonest binomial problems we wish to calculate the prob- 
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abilities at the ends of the distribution; for instance, the probability of 8 
or more successes. The exact result, found by adding the binomial prob- 
abilities for r = 8, 9, 10, is 56/1024 = 0.0547. Under our rule, the cor- 
responding area under the normal curve is the area from 7\ to co, not 
the area from 8 to oo . The normal deviate is therefore z = (7.5 - 5)/l .58 1 , 
which by a coincidence is also 1.581. The approximate probability from 
the normal table is P = 0.0570, close enough to 0.0547. Use of 
z — (8 — 5)/1.581 gives P = 0.0288, a poor result. 

Similarly, the probability of‘4 or fewer successes is approximated by 
the area of the normal curve from -oo to 4\. The general rule is to de- 
crease the absolute value of (r — np) by Thus, 

■Z c = (|r - np\ - ?)/J(npq) 

The subtraction of j is called the correction for continuity . It is simple to 
apply and usually improves the accuracy of the normal approximation, 
although when n is large it has only a minor effect. 

If you are working in terms of r/n instead of r, then 

= \ih Z p l ~ \Pn 

z ‘ vW") 

EXAMPLE 8.6.1 — For n = 10, p - 1/2, calculate: (i) the exact probability of 4 or 
fewer successes, and the normal approximation, (ii) corrected for continuity, (iii) uncor- 
rected. Ans. (i) 0.377, (ii) 0.376, (iii) 0.263. 

EXAMPLE 8.6.2 — In a sample of size 49 with p - 0.2, the expected number of successes 
is 9.8. An investigator is interested in the probability that the observed number of successes 
will be (i) 15 or more, or (ii) 5 or less. Estimate these two probabilities by the corrected 
normal approximation. Ans. (i) 0.0466 (ii) 0.0623. The exact answers by summing the 
binomial are : (i) 0.0517, (ii) 0.0547. Because of the skewness {p — 0.2), the normal curve 
underestimates in the long tail and overestimates in the short tail. For the sum of the two 
tails the normal curve does better, giving 0.1089 as against the exact 0.1064. 

EXAMPLE 8.6.3 With n = 1 6, p — 0.9, estimate by the normal curve the probability 
that 16 successes are obtained. The exact result is, of course, (0.9) 16 = 0.185. Ans. 0.180. 

8.7 — Confidence limits for a proportion. If r members out of a sample 
of size n are found to possess some attribute, the sample estimate of the 
proportion in the population possessing this attribute is p = r/n. In large 
samples, as we have seen, the binomial estimate p is approximately 
normally distributed about the population proportion p with standard 
deviation yj(pq/n). For the true but unknown standard deviation sj(pq/n) 
we substitute the sample estimate f(pq/n). Hence, the probability is 
approximately 0.95 that p lies between the limits 

p - \MyJ{pq/n) and p 4- 1.96 yj(pq/n) 

But this statement is equivalent to saying that p lies between 

P ~ 1*96 yj(pqjri) and p + \Mj(pq/n) (8.7.1) 

unless we were unfortunate in drawing one of the extreme samples that 
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tis up once in twenty times. The limits 8.7.1 are therefore the ap- 
>ximate 95% confidence limits for p. 

For example, suppose that 200 individuals in a sample of 1,000 
sess the attribute. The 95% confidence limits for p are 

0.2 ± 1 . 96 n /( 0 . 2 )( 0 . 8)/1000 » 0.2 ± 0.025 

Confidence interval extends from 0.175 to 0.225; that is, from 17.5% 
tc ;5%. Limits corresponding to other confidence probabilities are of 
cc e obtained by inserting the appropriate values of the normal deviate 
2 ->r 99% limits, we replace 1 .96 by 2.576. 

r the above reasoning is repeated with the correction lor continuity 
’ nc ed, the 95% limits for p become 

P ± {l.96jm/n) + 1/2 «} 

le correction is easily applied. It amounts to widening the limits a 
littlejVe recommend that the correction be used as a standard practice, 
althch it makes little difference when n ; s large. To illustrate the cor- 
rection a smaller sample, suppose that 10 families out of 50 report 
ownejp of more than one car, giving p = 0.2. The 95% confidence 
limits r p are 

0.2 ± {1.9670.16/50 + 0.01} « 0.2 ± 0.12, 

or .08 d .32. More exact limits for this problem, computed from the 
binomi. distribution itself, were presented in table 1.4.1 (p. 6) as 0.10 
and 0.3 The normal approximation gives the correct width of the 
intervah.24, but the normal limits are symmetrical about p, whereas the 
correct kits are displaced upwards because an appreciable amount of 
skewnesstjll remains in the binomial when n = 50 and p is not near 1/2. 

If yo arefer to express p and p in percentages, the 95% limits are 

p ± { 1 .96^(100 - pj/n + 50 jn\ 

You may veify that this formula gives 8% and 32% as the limits in the 
above problm. 

8.8— Tet of significance of a binomial proportion. The normal ap- 
proximations useful also in testing the null hypothesis that the population 
proportion o' successes has a known value p. If the null hypothesis is 
true, p is diitributed approximately normally with mean p and S.D. 
yf(pq/n). Wiih the correction for continuity, the normal deviate is 
Z c = (|p ~ p| - 1 /2n)/J(pq/n) 

= (|r - np| - \)lyl(npq) 

This can be referred to the normal tables to compute the probability of 
getting a sample proportion as divergent as the observed one. 

To take an example considered in chapter 1, a physician found 480 
men and 420 women among 900 admitted to a hospital with a certain 
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disease. Is this result consistent with the hypothesis that m the population 
of hospital patients, half the cases are male? Taking r as the number of 
males, 


. |480 - 4S0|-t 29.5 

““ V{(900)®<i» " 15 - ' M ' 

Since the probability is just on the 5% level, the null hypothesis is rejected 
at this level. 

If the alternative hypothesis is one-tailed, for instance that more than 
half the hospital patients are male, only one tail of the normal distribution 
is used. For this alternative the null hypothesis in the example is rejected 
at the 2\% level. 

In sections 1 .10-1 .12 you were given another method of testing a null 
hypothesis about p by means of chi-square with 1 degree of freedom. In 
the notation of chapter 1, 

2 V “ Ex P ’) 2 V (/ ~ F ) 2 

* -E eT p . 

the sum being taken over the two classes, male and female. The y 2 test 
is exactly the same as the two-tailed z test, except that the above formula 
for x 2 contains no correction for continuity. To show the relationship, 
we need to translate the notation of chapter 1 into the present notation, as 
follows: 


Notation of Chapter 1 

Present Notation 




Class 



Males 

Females 

Observed nos. 

/ 

r 

n — r 

Expected nos 

F 

np 

nq — n — np 

Obs — Exp 

f-F 

r — np 

“( r-np ) 


Hence, 


, 2 = y (/- F ) 2 = (r ~ HP? ( r ~ »P ) 2 

( 2-j r? + 


np 

(r - np) 2 
npq 


nq 


(q + p) = 


(r - np) 2 __ 


npq 


• = z , 


since the normal deviate z = (r — np)jyj(npq) if no correction for continu- 
ity is used. Further, the y z distribution, with 1 d.f., is the distribution of 
the square of a normal deviate: the 5% significance level of £ 2 , 3.84, is 
simply the square of 1 .96. Thus, the two tests are identical. 

To correct % 2 for continuity, we use the square of z, corrected for 
1 continuity. 
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, _ (jr - np\ - |) 2 

Ac 

npq 

As with z, we recommend that the correction be applied routinely. For 
one-sided alternatives the z method is preferable, since x 2 takes no ac- 
count of the sign of (r - np) and is basically two-sided. 

EXAMPLE 8 8 1— Two workers A and B perform a task in which carelessness leads to 
minor accidents In the first 20 accidents. 1 3 happened to A and 7 to B In a previous ex- 
ample (I 15.1) you were asked to calculate x 2 for testing the null hypothesis that A and B 
are equally likely to have accidents, the answer being x 2 = 1 8, with P about 018 Re- 
calculate x 2 and P y corrected for continuity Ans * 1 25, P slightly greater than 0 25 

EXAMPLE 8 8.2— A question that is asked occasionally is whether the I fl correction 
should be applied in x 2 if \ r ~ np\ is less than 1/2. This happens for instance, if r * 6, 
n — 25 and the null hypothesis is p « 1/4, because np * 6.25 and |r ~ «p| « Q 25 Strictly, 
the answer m such cases is that the corrected value of % l is zero When n * 25, the result 
r = 6 is the sample result that gives the closest possible agreement with the null hypothesis, 
np — 6 25 Hence, all possible samples with n ® 25 give results at least as divergent from the 
null hypothesis The significance P is therefore 1 , corresponding to x 2 * 0 

8.9 — The comparison of proportions in paired samples. A comparison 
of two sample proportions may arise either m paired or in independent 
samples. To illustrate paired samples, suppose that a lecture method is 
being compared with a method that uses a machine for programmed 
learning but no lecture, the objective being to teach workers how to per- 
form a rather complicated operation. The workers are first grouped into 
pairs by means of an initial estimate of their aptitudes for this kind of task. 
One member of each pair is assigned at random to each method. At the 
end, each student is tested to see whether he succeeds or fails m a test on 
the operation. 

With 100 pairs, the results might be presented as follows: 


Result for Method 

A B No, of Pairs 


5 S 

S F 

F S 

F F 

Total 100 

In 52 pairs, both workers succeeded in the test; m 21 pairs, the 
worker taught by method A succeeded, but his partner taught by method 
B failed, and so on. 

As a second illustration (2), different media for growing diphtheria 
bacilli were compared. Swabs were taken from the throats of a large 
number of patients with symptoms suggestive of the presence of diphtheria 
bacilli. From each swab, a sample was grown on each medium. After 
allowing time for growth, each culture was examined for the presence or 


52 

21 

9 

18 
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absence of the bacilli. A successful medium is one favorable to the 
growth of the bacilli so that they are detected. This is an example of self- 
pairing, since each medium is tested on every patient. It is also an example 
in which a large number of FF s would be expected, because diphtheria 
is now rare and many patients would actually have no diphtheria bacilli in 
their throats. 

Consider first the test of significance of the null hypothesis that the 
proportion of successes is the same for the two methods or media. The 
SS and FF pairs are ignored in the test of significance, since they give no 
indication in favor of either A or B. We concentrate on the SF and FS 
pairs. If the null hypothesis is true, the population must contain as many 
£Fas FS pairs. In the numerical example there are 21 + 9 = 30 pairs of 
the SF or FS types. Under the null hypothesis we expect 15 of each type 
as against 21 and 9 observed. 

Hence, the null hypothesis is tested by either the % 2 or the z test of 
the preceding section. (In the z test we take n = 30, r = 21, p = 1/2). 
When p = 1/2, j 2 takes the particularly simple form (section 5.4), 


2 _ (|21 - 9| - l) 2 121 

: 30 ~ 30 


4.03 


with I d.f. The null hypothesis is rejected at the 5% level (3.84). Method 
A has given a significantly higher proportion of successes. Remember 
that in this test, the denominator of x 2 is always the total number of SF 
and FS pairs. This test is the same as the sign test (section 5.4). 

The investigator will also be interested in the actual percentages of 
successes given by the two methods. These were: 52 -f 21 = 73% for A 
and 52 + 9 = 61% for F. If the task is exceptionally difficult, he might 
conclude that although A is significantly better than B, both methods are 
successful enough to be useful. In other circumstances, he might report 
that neither method is satisfactory. This might be the case if A and B 
were two new techniques for predicting some feature of the weather, and 
if standard techniques were known to give more than 85% successes. 

When there is clearly a difference between the performances of the 
two methods, we may wish to report this difference, (73% — 61%) = 12%, 
along with its standard error. Let 


p SF = proportion of SF pairs = — = 0.21 

100 

9 

Pfs — proportion of FS pairs = — — = 0.09 

100 

When the difference is expressed in percentages (12%), a simple formula for 
its standard error is 
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PSF + PlFS ~ (PSP ~ P fs) J | 

0.21 + 0.09 - (0.21 - u.t)9) 2 } 

100 | 

= 10^/0.2856 = 5.3 

If the difference is expressed in proportions, the factor 100 is omitted. 

Note: If you record only that A gave 73 successes and B gave 61 
successes out of 100, the test of significance in paired data cannot be made 
from this information alone. The classification of the results for the in- 
dividual pairs must be available. 

8.10 — Comparison of proportions in two independent samples: the 
2x2 table. This problem occurs very often in investigative work. Many 
controlled experiments which compare two procedures or treatments are 
carried out with independent samples, because no effective way of pairing 
the subjects or animals is known to the investigator. Comparison of 
proportions in different groups is also common in non-experimental 
studies. A manufacturer compares the proportions of defective articles 
found in two separate sources of supply from which he buys these articles, 
or a safety engineer compares the proportions of head injuries sustained 
in automobile accidents by passengers with seat belts and those without 
seat belts. 

Alternatively, a single sample may be classified according to two dif- 
ferent attributes. The data used to illustrate the calculations coipe from 
a large Canadian study (3) of the relation between smoking and mor- 
tality. By an initial questionnaire in 1 956, male recipients of war pensions 
were classified according to their smoking habits. We shall consider two 
classes: (i) non-smokers and (ii) those who reported that they smoked 
pipes only. For any pensioner who died during the succeeding six years, 
a report of the death was obtained. Thus, the pensioners were classified 
also according to their status (dead or alive) at the end of six years. Since 
the probability of dying depends greatly on age, the comparison given 
here is confined to men aged 60-64 at the beginning of the study. The 
numbers of men falling in the four classes'are given in table 8.10.1, called 
a 2x2 contingency table. 

It will be noted that 1 1 .0% of the non-smokers had died, as against 
13.4% of the pipe smokers. Can this difference be attributed to sampling 
error, or does it indicate a real difference in the death rates in the two 
groups? The null hypothesis is that the proportions dead, 1 17/1067 and 
54/402, are estimates of the same quantity. 

The test can be performed by x 2 - As usual. 



x 2 = Z 


(/- Ff 

~~ ■> 
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TABLE 8.10.1 

Men Classified by Smoking Habit and Mortality in Six Years 



Non-smokers 

Pipe Smokers 

Total 

Dead 

117 

54 

171 

Alive 

950 

348 

1,298 

Total 

1,067 

402 

1,469 

% dead 

11.0 

13.4 



where the/’s are the observed numbers 117, 950, 54, 348 in the four cells. 
The F s are the numbers that would be expected in the four cells if the 
null hypothesis were true. 

The F s are computed as follows. If the proportions dead are the 
same for the two smoking classes, our best estimate of this proportion is 
the proportion, 171/1469, found in the combined sample. Since there are 
1067 non-smokers, the expected number dead, on the null hypothesis, is 


(1067)(171) 

1469 


= 124.2 


The rule is: to find the expected number in any cell, multiply the cor- 
responding column and row totals and divide by the grand total. The 
expected number of non-smokers who are alive is 


(1067)(1298) 

1469 


942.8, 


and so on. Alternatively, having calculated 124.2 as the expected number 
of non-smokers who are dead, the expected number alive is found more 
easily as 1067 — 124.2 = 942.8. Similarly, the expected number of pipe 
smokers who are dead is 171 — 124.2 = 46.8. Finally, the expected num- 
ber of pipe smokers who are alive is 402 - 46.8 = 355.2. Thus, only one 
expected number need be calculated; the others are found by subtraction. 
The observed numbers, expected numbers, and the differences (/— F) 
appear in table 8.10.2. 

Except for their signs, all four deviations (/ - F) are equal. This result 
holds in any 2x2 table. 


TABLE 8.10.2 

Values of /(Observed), ^(Expected), and if - F ) in the Four Cells 


f F f-F 





1 


. 124.2 

46.8 

942.8 

355.2 


-7.2 

+ 7.2 

+ 7.2 

-7.2 
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Since (/- Ff is the same m ali cells, * 2 may be written 

X 2 =.(/- F) 2 £ ~ ( 8 . 10 . 1 ) 

i-1 M 

= (7.2) 2 f— i- + _L + .JL + JL\ 

V124.2 ^ 46.8 942.8 355.2/ 

= (51.84)(0.0333) = 1.73 

A table of reciprocals is useful in this calculation, since the four reciprocals 
can be added directly. 

How many degrees of freedom has y 2 ? Since all four deviations are 
the same except for sign, this suggests that y 2 has only 1 d.f., as was 
proved by Fisher. With 1 rf./., table A 5 shows that a value of y 2 greater 
than 1.73 occurs with probability about 0.20. The observed difference 
in proportion dead between the non-smokers and pipe smokers may well 
be due to sampling errors. 

The above y 2 has not been corrected for continuity. A correction is 
appropriate because the exact distribution of y 2 in a 2 x 2 table is discrete. 
With the same four marginal totals, the two sets of results that are closest 
to our observed results are as follows : 

(i) (ii) 


118 

53 

171 

116 

55 

171 

949 

349 

1298 

951 

347 

1298 

1067 

402 


‘ 1067 

402 



/- F = ±6.2' /- F - ±8.2 


Since the expected values do not change, the values (/— F) are ±6.2 in 
(i) and ± 8,2 in (ii), as against ± 7.2 in our data. Thus, in the exact dis- 
tribution of x 2 the values of (/ — F| jump by unity. The correction for 
continuity is made by deducting 0.5 from |/— F j. The formula for cor- 
rected x 2 is 

I 2 ** (|/~~ F\ - 0.5) 2 Z1/F* (8.10,2) 

~ (6.7) 2 (0.0333) - 1.49 

The corrected P is about 0.22, little changed in this example because the 
samples are large. In small samples the correction makes a substantial 
difference. 

Some workers prefer an alternative formula for computing y 2 . The 
2x2 table may be represented in this way : 


a 

b 

a 4- b 

c 

d 

c 4 * d 

a 4- c 

b + d 

N~a+b+c+d 


2 _ N (\ Ud - bC \ Z N/2 )2 

(a 4- b)(c -f d)(a 4- c)(b ± d) 


(8.10.3) 
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The subtraction of N/2 represents the correction for continuity 

In interpreting the results of these x 2 tests m non-expenmental 
studies, caution is necessary, particularly when x 2 is significant The two 
groups being compared may differ m numerous ways, some of which 
may be wholly or partly responsible for an observed significant difference 
For instance, pipe smokers and non-smokers may differ to some extent 
in their economic levels, residence (urban or rural), and eating and drink- 
ing habits, and these variables may be related to the risk of dying Before 
the investigator can claim that a significant difference is caused by the 
variable under study, it is his responsibility to produce evidence that 
disturbing variables of this type could not have produced the difference 
Of course, the same responsibility rests with the investigator who has done 
a controlled experiment But the device of randomization, and the greater 
flexibility which usually prevails in controlled experimentation, make it 
easier to ensure against misleading conclusions from disturbing influences 

EXAMPLE 8 10 1 — In a study as to whether cancer of the breast tends to “run m 
families,” Murphy and Abbey (4) investigated the frequency of breast cancer found m rela- 
tives of (i) women with breast cancer, (u) a comparison group of women without breast 
cancer The data below, slightly altered for easy calculation, refer to the mothers of the 
subjects 



i 

1 Breast Cancer m Subject 
! Yes No 

Total 

Breast Cancer 

Yes 

7 

3 

10 

m Mother 

No 

193 

197 

390 


Total 

200 

200 

400 


Calculate x 2 and P (i) without correction, (n) with correction for continuity, for testing the 
null hypothesis that the frequency of cancer in mothers is the same in the two classes of 
subjects Ans (i) x 2 ~ 1 64, P — 0 20 (u) x 2 — 0 92, P = 0 34 Note that the correction 
for continuity always increases P, that is, maJces the difference less significant 

EXAMPLE 8 10 2 — In the previous example, verify that the alternative formula 
8 10 3 for Xc 2 gives the same result, by showing that x 2 in 8 10 3 comes out as 12/13 = 0 92 

EXAMPLE 8 10 3 — Dr C H Richardson has furnished the following numbers of 
aphids {Aphis rumicis L) dead and alive after spraying with two concentrations of solutions 
of sodium oleate 



Concentration of Sodium Oleate 
(percentage) 

0 65 1 10 

! Total 

Dead 

55 

62 

; in 

Alive 

13 

3 

16 

Total 

68 

65 

133 

Per Cent Dead 

80 9 

95 4 



Has the higher concentration given a significantly different per cent kilP Ans x 2 — 531, 
P < 0 025 





219 


EXAMPLE 8 10 4 In examining the effects of sprays in the control of codling moth 
injury to apples Hansberry and Richardson (5) counted the wormy apples on each of 48 
trees Two trees sprayed with the same amount of lead arsenate yielded 

A 2,130 apples 1 299 or 61% of which were injured 

B 2,190 apples, 1,183 or 45% of which were injured 

X 2 85 21 16 is conclusive evidence that the chance of injury was different m these two trees 
This result is characteristic of spray experiments For some unknown reasons, injuries 
under identical experimental treatments differ significantly Hence it is undesirable to 
compare sprays on single trees, because a difference m percentage of injured apples might be 
due to these unknown sources rather than to the treatments A statistical determination of 
the homogeneity or heterogeneity of experimental material under identical conditions, 
sometimes called a test of technique , is often worthwhile, particularly in new fields of research. 

EXAMPLE 8 10 5 — Prove that formulas 8 10 2 and 810 3 for x 2 ate the mine, by 
showing that 


\f- Fj w j ad - hc\/N 

r(l /F) « M 3 /(a + h){c + d){a 4- c)(h 4- d) 

8.11 — Test of the independence of two attributes. The preceding test 
is sometimes described as a test of the independence of two attributes A 
sample of people of a particular ethnic type might be classified into two 
classes according to hair color and also into two classes according to color 
of eyes We might ask are color of hair and color of eyes independent 7 
Similarly, the numerical example m the previous section might be re- 
ferred to as a test of the question Is the risk of dying independent of 
smoking habit 9 

In this way of speaking, the word “independent 1 ’ carries the same 
meaning as it does in Rule 3 m the theory of probability Let p A be the 
probability that a member of a population possesses attribute A, and p B 
the probability that he possesses attribute B If the attributes are inde- 
pendent, the probability that he possesses both attributes is p A p E Thus, 
on the null hypothesis of independence, the probabilities m the four cells 
of the 2 x 2 contingency table are as follows 



Attribute A 

(1) 

Present 

(2) | 
Absent | 

Total 

(U Present 

PaPb 

9a Pb 1 

Pb 

Attribute B 




(2) Absent 

Pa9b 

9a9b 

9b 

Total 

Pa 

9a 

l 


Two points emerge from this table The null hypothesis can be 
tested either by comparing the proportions of cases in which B is present 
in columns (1) and (2), or by comparing the proportions of cases m which 
A is present m rows (1) and (2) These two x 2 tests are exactly the same. 
This is not obvious from the original expressions (8 10 l)and(8 10 2) given 
for x 2 and Xc 2 i but expression (8 10 3) makes it clear that the statement 
holds 
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Secondly, the table provides a check on the rule given for calculating 
the expected number in any cell. In a single sample of size N, we expect 
to find Np A p B members possessing both A and B. The sample total in 
column (1) will be our best estimate of Np At while that in row (1) similarly 
estimates Np B . Thus the rule, (column total)(row .total)/(grand total) 
gives (Np A )(Np B )/N = Np A p B as required. 

8.12 — A test by means of the normal deviate z. The null hypothesis 
can also be tested by computing a normal deviate z, derived from the 
normal approximation to the binomial. The z and % 2 tests are identical. 
Many investigators prefer the z form, because they are primarily interested 
in the size of the difference p x — p 2 between the proportions found in two 
independent samples. For illustration, we repeat the data from table 
8 . 10 . 1 . 


TABLE 8.12.1 

Men Classified by Smoking Habit and Mortality in Six Years 



Sample (1) 

Sample (2) 



Non-smokers 

Pipe Smokers 

Total 

Dead 

117 

54 1 

! 171 

Alive 

950 

348 

| 1,298 , 

Total 

n y = 1,067 

n 2 = 402 

1,469 

Proportion dead 

p x « 0 1097 

m 

d 

II 

1 

j 

i 

p = 0.1164 


Since p t = 0.1097 and p 2 = 0.1343 are approximately normally dis- 
tributed, their difference p l — p 2 is also approximately normally dis- 
tributed. The variance of this difference is the sum of the two variances 
(section 4.7). 


V{p 1 - P 2 ) = V + 0> 2 2 = M I + 

y y n 1 

Under the null hypothesis, p 2 = p 2 = p, so that p t — 
normally distributed with mean 0 and standard error 


P2Q2 

n 2 

p 2 is approximately 


J 


M + m) 

n x n 2 j 


The null hypothesis does not specify the value of p. As an estimate, 
we naturally use p = 0.1164 as given by the combined samples. Hence, 
the normal deviate z is 
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z = h ~ h 0-1097 - 0.1343 -0.0246 

fitbM ° mV 

= —1.31 

In the normal table, ignoring the sign of z, we find P « 0.19, in agreement 
with the value found by the original x 2 test. 

To correct z for continuity, subtract \ from the numerator of the 
larger proportion (in this case p 2 ) and add | to the numerator of the 
smaller proportion. Thus, instead of p 2 we use p 2 ' « 53,5/402 * 0.1331 
and instead of p x we use p t ' = 1 17.5/1067 = 0.1 101 . The denominator of 
z c remains the same, giving z c (0.1101 - 0.1331)/0.01877 » - 1.225, 
You may verify that, apart from rounding errors, z 2 = % 2 and z / * x 2 * 

If the null hypothesis has been rejected and you wish to find confidence 
limits for the population difference p x — p 2n the standard error of p x - p 2 
should be computed as 

PAl t Pl$2 \ 

~ + ^r\ 

The s.e. given by the null hypothesis is no longer valid. Often the change 
is small, but it can be material if n t and n 2 are very unequal. 

EXAMPLE 8.12.1 — Apply the z test and the z c test to the data on breast cancer given 
m example 8.10.1 and verify that z 2 » x 2 an <f 388 X 2 * Note, when calculating z or z e 
it is often more convenient to express p u p 2 and p as percentages Just remember that in 
this event, 4 — 100 — p. 

EXAMPLE 8.12.2 — In 1943 a sample of about 1 m 1,000 families tn Iowa was asked 
about the canning of fruits or vegetables during the preceding season. Of the 392 rural 
families, 378 had done canning, while of the 300 urban families, 274 had canned. Calculate 
95% confidence limits for the difference in the percentages of rural and urban families who 
had canned. Ans. 1.42% and 8.78%. 

The preceding x 2 and z methods are approximate, the approximation 
becoming poorer as the sample size decreases. Fisher (14) has shown 
how to compute an exact test of significance. For accurate work the exact 
test should be used if (i) the total sample size N is less than 20, or (ti) if N 
lies between 20 and 40 and the smallest expected number is less than 5. 
For those who encounter these conditions frequently, reference (15), 
which gives tables of the exact tests covering these cases, is recommended. 

8.13 — Sample size for comparing two proportions. The question : 
How large a sample do I need? is naturally of great interest to investigators. 
For comparing two means, an approach that is often helpful was given 
in section 4.13, p. 111. This should be reviewed carefully, since the same 
principle applies to the comparison of two proportions. The approach 
assumes that it is planned to make a test of significance of the difference 
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between the two proportions, and that future actions will depend on 
whether the test shows a significant difference or not. Consequently, if the 
true difference p 2 - Pi is as large as some amount 8 chosen by the in- 
vestigator, he would like the test to have a high probability P' of declar- 
ing a significant result. 

For two independent samples, formula (4. 1 3. 1 ) (p. 1 1 3 ) for n, the size 
of each sample, can be applied. Put 8 =p 2 - p x and tr D 2 = {p x q 1 4- p 2 <h)- 
This gives 

n = (Z a -j- Zp) 2 (p x qi 4- p 2 ^i)/{P 2 ~ Pi) 2 (8.13.1) 


where Z a is the normal deviate corresponding to the significance level to 
be used in the test, yg = 2(1 — P'), and Z p is the normal deviate correspond- 
ing to the two-tailed probability /?. Table 4.13.1 gives (Z a 4- Z p ) 2 for the 
commonest values of a and p. In using this formula, we substitute the 
best advance estimate of {p x q x 4- p 2 c h) in the numerator. 

For instance, suppose that a standard antibiotic has been found to 
protect about 50% of experimental animals against a certain disease. 
Some new antibiotics become available that seem likely to be superior. 
In comparing a new antibiotic with the standard, we would like a prob- 
ability P' = 0.9 of finding a significant difference in a one-tailed test 
at the 5% level if the new antibiotic will protect 80% of the animals in 
the population. For these conditions, table 4. 1 3. 1 gives ( Z a + Z p ) 2 as 8.6. 
Hence 

n = (8.6){(50)(50) 4- (80)(20)}/(30) 2 - 39.2 

Thus, 40 animals should be used for each antibiotic. 

Some calculations of this type will soon convince you of the sad fact 
that large samples are necessary to detect small differences between two 
percentages. When resources are limited, it is sometimes wise, before 
going ahead with the experiment, to calculate the probability that a sig- 
nificant result will be found. Suppose that an experimenter is interested 
in the values p x = 0.8, p 2 = 0.9, but cannot make n > 100. If formula 
(8.13.1) is solved for Z p , we find 

' ~ ' * oT" z “- 2 z ‘ 

If he intends a two-tailed 5% test Z a =b 2, so that Z p = 0. This gives 
jS = 1 and P' = 1 — p/2 = 0.5. The proposed experiment has only a 
50-50 chance of finding a significant difference in this situation. 

Formula (8.13.1), although a large-sample approximation, should be 
accurate enough for practical use, since there is usually sortie uncertainty 
about the values of p x and p 2 to insert in the formula. Reference (6) gives 
tables of n based on a more accurate approximation. 
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EXAMPLE 8.13.1 — One difficulty m estimating sample size m biological work is that 
the proportions given by a standard treatment may vary over time. An experimenter has 
found that his standard treatment has a failure rate lying between p x * 30% and p x * 40%. 
With a new treatment whose failure rate is 20% lower than the standard, what sample sizes 
are needed to make P' = 0.9 in a two-tailed 5% test? Ans. n =* 79 when p t * 30% and 
n = 105 when p x - 40%. 

EXAMPLE 8 13.2 — In planning the 1954 trial of the Salk poliomyelitis vaccine (7), 
the question of sample size was critical, since it was unlikely that the tnal could be repeated 
and since an extremely large sample of children would obviously be necessary Various esti- 
mates of sample size were therefore made. In one of these it was assumed that the probability 
that an unprotected child would contract paralytic polio was 0 0003, or 0.03% If the vaccine 
was 50% effective (that is, decreased this probability to 0 00015, or 0.015%), it was desired 
to have a 90% chance of finding a 5% significance difference in a two-tailed test. How many 
children are required 9 Ans. 210,000 m each group (vaccinated and unprotected) 

EXAMPLE 8 13.3— An investigator has p x « 04, and usually conducts experiments 
with n - 25. In a one-tailed test at the 5% level, what is the chance of obtaining a significant 
result if (i) p 2 - 0.5, (ii) p 2 = 0.6? Ans. (i) 0.18, (u) 0.42. 


8.14 — The Poisson distribution. As we have seen, the binomial dis- 
tribution tends to the normal distribution as n increases for any fixed value 
of p. The value of n needed to make the normal approximation a good 
one depends on the values of /?, this value being smallest when p * 1/2. 
For p < 1/2, a general rule, usually conservative, is that the normal ap- 
proximation is adequate if the mean p = np is greater than 1 5. 

In many applications, however, we are studying rare events, so that 
even if n is large, the mean np is much less than 1 5. The binomial distribu- 
tion then remains noticeably skew and the normal approximation is un- 
satisfactory. A different approximation for such cases was developed by 
S. D. Poisson (8). He worked out the limiting form of the binomial dis- 
tribution when n tends to infinity and p tends to zero at the same time, in 
such a way that p = np is constant. The binomial expression for the 
probability of r successes tends to the simpler form, 

P(r) = L <?“" r = 0, 1, 2 

r! 

where e = 2.71828 is the base of natural logarithms. The initial terms in 
the Poisson distribution are: 

2 3 

P( 0) = e " : P(l) = pe-* : P(2) = : P( 3) - ^ 

Table 8.14.1 shows in column (1) the Poisson distribution for p = 1. 
The distribution is markedly skew. The mode (highest frequency) is at 
either 0 or 1 , these two having the same probability when ju = 1 . To give 
an idea of the way in which the binomial tends to approach the Poisson, 
column (2) shows the binomial distribution for n = 100, p = 0.01, and 
column (3) the binomial for n = 25, p = 0.04, both of these having 
np = 1. The agreement with the Poisson is very close for n = 100 and 
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TABLE 8.14.1 

The Poisson Distribution for fi = 1 Compared With the Binomial 
Distributions for n = 100, p = 0.01 and n = 25, p = 0.04 


r 

Relative Frequencies 

0) 

Poisson 

1 

(2) 

Binomial 
n = 100,^ = 0.01 

(3) 

Binomial 
n = 25, p = 0.04 

0 

0.3679 

0.3660 

0.3604 

1 

0.3679 

0.3697 

0.3754 

2 

0.1839 

0.1849 

0.1877 

3 

0.0613 

0.0610 

0.0600 

4 

0.0153 

0.0149 

0.0137 

5 

0.0031 

0.0029 

0.0024 

6 

0.0005 

0.0005 

0.0003 

£7 

0.0001 

0.0001 

0.0000 

Total 

1.0000 

1.0000 

0.9999 


quite close for w = 25. Tables of individual and cumulative terms of the 
Poisson are given in (9) and of individual terms up to /z = 15 in (10). 

The fitting of a Poisson distribution to a sample will be illustrated by 
the data (11) in table 8.14.2. These show the number of noxious weed 
seeds in 98 sub-samples of Phleum praetense (meadow grass). Each sub- 
sample weighed 1/4 ounce, and of course contained many seeds, of which 
only a small percentage were noxious. The first step is to compute the 
sample mean. 

fi = (Z/r)/(2/) = 296/98 = 3.0204 noxious seeds per sub-sample 


TABLE 8.14.2 

Distribution of Number of Noxious Weed Seeds Found in N = 98 
Sub-samples, With Fitted Poisson Distribution 


■ i 

Number of 
Noxious Seeds 

r 

Frequency 

/ 

Poisson 

Multipliers 

, Expected 
Frequency 

0 

3 

1 = 1.0000 

4.781 

1 

17 

fi = 3.0204 

14.440 

2 

26 

A/2 = 1.5102 

21.807 

3 

16 

fit 3 = 1.0068 

21.955 

4 

18 

fit 4 = 0.7551 

16.578 

5 

9 

fi/ 5 = 0.6041 

10.015 

6 

3 

fi/ 6 = 0.5034 

5.042 

7 

5 

fi/ 7 =0.4315 

2.176 

• 8 

0 

/i/8 = 0.3756 

0.817 

9 

1 1 

fit 9 = 0.3356 

0.274 

10 

1 0 

{MO = 0.3020 

0.083 

' 1 1 or more 

1 0 

| 

fi / 1 1 = 0.2746 

0.030 

Total 

98 


97.998 
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Next, calculate the successive terms of the Poisson distribution with 
mean p. The expected number of sub-samples with 0 seeds is 
Ne = (98)(e 3 ' 0204 ). A table of natural logs gives 5,0204 » 1/20,5, 
and 98/20.5 = 4.781. Next, form a column of the successive multipliers 
1, p, /i/2, . , . as shown in table 8.14.2, recording each to at least four 
significant digits. The expected number of sub-samples with r * I is 
(4.781 )(/2) = 14.440. Similarly, the expected number with r « 2 is 
(14.440) (/i/2) = (14.440)(1.5102) = 21.807, and so on. The agreement be- 
tween observed and expected frequencies seems good except perhaps for 
r = 2 and r = 3, which have almost equal expected numbers but have ob- 
served numbers 26 and 16. A test of the discrepancies between observed 
and expected numbers (section 9.6), shows that these can well be accounted 
for by sampling errors. 

Two important properties hold for a Poisson variate. The variance 
of the distribution is equal to its mean, p. This would be expected, since 
the binomial variance, npq , tends to np when q tends to 1 . Secondly, if a 
series of independent variates X u X 2 , X if . . . each follow Poisson distribu- 
tions with means p u p 2 , p^ . . . , their sum follows a Poisson distribution 
with mean (p x + p 2 + + . . . ). 

In the inspection and quality control of manufactured goods, the 
proportion of defective articles in a large lot should be small. Conse- 
quently, the number of defectives in the lot might be expected to follow a 
Poisson distribution. For this reason, the Poisson distribution plays an 
important role in the development of plans for inspection and quality 
control. Further, the Poisson is often found to serve remarkably well as 
an approximation when p is small, even if the value of n is ill-defined and 
if both n and p presumably vary from one sample to another. A much- 
quoted example of a good fit of a Poisson distribution, due to Bortke- 
witch, is to the number of men in a Prussian army corps who were killed 
during a year by the kick of a horse. He had N = 200 observations, one 
for each of 10 corps for each of 20 years. On any given day, some men 
were exposed to a small probability of being kicked, but is not clear what 
value n has, nor that p would be constant. 

The Poisson distribution can also be developed by reasoning quite 
unrelated to the binomial. Suppose that signals are being transmitted, 
and that the probability that a signal reaches a given point in a tiny time- 
interval t is /T, irrespective of whether previous signals have arrived 
recently or not. Then the number of signals arriving in a finite time- 
interval of length T may be shown to follow a Poisson distribution with 
mean IT (example 8.14.4). Similarly, if particles are distributed at 
random in a liquid with density k per unit volume, the number found in a 
sample of volume V is a Poisson variable with mean kV. From these 
illustrations it is not surprising that, the Poisson distribution has found 
applications in many fields, including communications theory and the 
estimation of bacterial densities. 

EXAMPLE 8.14.1 ~n = 1,000 independent trials are made of an event with probability 
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301 at each tnal Give approximate results for the chances that (i) the event does not 
Lppen, (ii) the event happens twice, (m) the event happens at least five times Ans (i) 0 368, 
) 0 184, (m) 0.0037 

EXAMPLE 8 14 2 — A G Arbous and I E Kernch (12) report the numbers of acci- 
nts sustained during their first year by 155 engine shunters aged 31- 35, as follows 


o of accidents 

1 0 

1 

2 

3 

4 or more 

o of men 

80 

61 

13 

1 

0 


Fit a Poisson distribution to these data Note the data were obtained as part of a study 
‘ accident proneness If some men are particularly liable to accidents, this would imply 
at the Poisson would not be a good fit, since p would vary from man to man 

EXAMPLE 8 14 3 — Student (13) counted the number of yeast cells on each of 400 
uares of a hemacytometer In two independent samples, each of which gave a satisfactory 
to a Poisson distribution, the total numbers of cells were 529 and 720 (i) Test whether 
ese totals are estimates of the same quantity, or m other words whether the density of 
:ast cells per square is the same in the two populations (n) Compute 95% limits for the 
Iferencem density per square Ans(i)z = 5 41 P very small (n) 0 30 to 0 65 Note the 
>rmal approximation to the Poisson distribution, or to the difference between two mde- 
ndent Poisson vanates, may be used when the observed numbers exceed 15 

EXAMPLE 8 14 4 — The Poisson process formula for the number of signals arriving in a 
ute time-mterval T requires one result m calculus, but is otherwise a simple application of 
obability rules Let P(r , T + t) denote the probability that exactly r signals have arrived 
the interval from time 0 to the end of time {T + t) This event can happen in one of two 
utually exclusive ways (l) (r — 1) signals have arrived by time T, and one arrives m the 
nail interval t The probability of these two events is AiP(r — 1 , T) (n) r signals have 
ready arrived by time 7 1 , and none arrives m the subsequent interval t The probability 
'these two events is (1 — At )P(r, T) The interval t is assumed so small that more than one 
gnal cannot arrive in this interval Hence, 

P(r, T+ t) AtP(r - 1, F) + (1 - >t)P(r , T) 

Rearranging, we have 

{P(r, T+ t) - P(r, T)} H = A{P(r - 1, T) - P(r, 73) 

siting t tend to zero, we get dP{r , T)/dT = /{P(r — 1, 73 — P(r, 73} By differentiating, 
will be found that P(r, T) - e~ XT (AT) r /r ? satisfies this equation 
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* CHAPTER NINE 


A ttribute data with more than 
one degree of freedom 


9.1 — Introduction. In chapter 8 the discussion of attribute data was 
confined to the cases in which the population contains only two classes of 
individuals and in which only one or two populations have been sampled. 
We now extend the discussion to populations classified into more than 
two classes, and to samples drawn from more than two populations. 
Section 9.2 considers the simplest situation in which the expected numbers 
in the classes are completely specified by the null hypothesis. 

9.2 — Single classifications with more than two classes. In crosses 
between two types of maize, Lmdstrom (1) found four distinct types of 
plants in the second generation. In a sample of 1,301 plants, there were 


fx = 773 green 

f 2 = 231 golden 

/ 3 = 238 green-striped 

f A = 59 golden-green-striped 

730T 

According to a simple type of Mendelian inheritance, the probabilities 
of obtaining these four types of plants are 9/16, 3/16, 3/16, and 1/16, 
respectively. We select this as the null hypothesis. 

The x 2 test m chapter 8 is applicable to any number of classes. Ac- 
cordingly, we calculate the numbers of plants that would be expected in 
the four classes if the null hypothesis were true. These numbers, and the 
deviations (/ — F), are shown below. 


F x = (9/16)(1301) = 731.9 

F 2 = (3/16)(1301) = 243.9 

F 3 - (3/16)(1301) = 243.9 
F 4 = (1/16)(1301)= 8L3 


fx - ^ +41.1 

/ 2 ~F 2 = — 12.9 
fz F 2 ~ 5.9 

h ~ F a = -22.3 


1301.0 


0.0 


Substituting in the formula for chi-square, 
228 



X 2 « 2(/- F) 2 /F 

f2 _(41.1) 2 (— 12.9) 2 ( — 5.9) 2 ( - 223 ) 2 

731.9 243.9 ~ + 243.9 + ~ 813 

= 2.31 + 0.68 + 0.14 + 6.12 
= 9.25 


in 


In a test of this type, the number of degrees of freedom m % 2 ** {Num- 
ber of classes) — 1=4— -1=3. To remember this rule, note that there 
are four deviations, one for each class. However, the sum of the four 
deviations, 41.1 — 12.9 ~ 5.9 — 22.3, is zero. Only three of the devia- 
tions can vary at will, the fourth being fixed as zero minus the sum of the 
first three. 

Is x 2 as large as 9.25, with df = 3, a common event m sampling from 
the population specified by the null hypothesis 9 *3 : 3 : 1 , or is it a rare one? 
For the answer, refer to the x 2 table (table A 5, p. 550), in the line for 
3 d.f. Y ou will find that 9.25 is beyond the 5% point, near the 2.5% point. 
On this evidence the null hypothesis would be rejected. 

When there are more than two classes, this x 2 test is usually only a 
first step in the examination of the data. From the test we have learned 
that the deviations between observed and expected numbers are too large 
to be reasonably attributed to sampling fluctuations. But the x 2 tes* does 
not tell us in what way the observed and expected numbers differ For 
this, we must look at the individual deviations and their contributions to 
X 2 . Note that the first class, (green), gives a large positive deviation +41.1 
and is the only class giving a positive deviation. Among the other classes, 
the last class (golden-green-striped) gives the largest deviation, —22.3, 
and the largest contnbution to x 2 , 6.12 out of a total of 9.25. Lmdstrom 
commented that the deviations could be largely explained by a physio- 
logical cause, namely the weakened condition of the last three classes due 
to their chlorophyll abnormality. He pointed out in particular that the 
last class (golden-green-striped) was not very vigorous. 

To illustrate the type of subsequent analysis that is often necessary 
with more than two classes, let us examine whether the data are consistent 
with the weaker hypothesis that the numbers in the first three classes are 
in the predicted Mendelian ratios 9:3:3. If so, one interpretation of the 
results is that the significant value of x 2 can be attributed to poor survivor- 
ship of the golden-green-stnped class. 

The 9:3:3 hypothesis is tested by a x 2 test applied to the first three 
classes. The calculations appear m table 9.2.1. 

In the first class, F x = (0.6)(1 242) = 745.2, and so on. The value of 
X 2 is now 2.70, with 3-1=2 d.j. Table A 5 shows that the probability 
is about 0.25 of obtaining a x 2 as large as this when there are 2 d.f. 

We can also test whether the last class (golden-green-stnped) has a 
frequency of occurrence significantly less than would be expected from 
its Mendelian probability 1/16. For this we observe that 1242 plants fell 
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TABLE 9.2.1' 

Test of the Mendelian Hypothesis in the First Three Classes 


Class 

7 

Hypothetical 

Probability 

• F 

i 



(/- F) 2 /F 

green 

773 

9/15 = 0.6 

745.2 

4*27.8 

1.04 

golden 

231 

3/15 = 0.2 

248.4 

-17.4 

1.22 

green-striped 

238 

3/15 = 0.2 

248.4 

-10.4 

0.44 

Total 

1242 

15/15 = 1 

1242.0 

0.0 

2.70 


into the first three classes, which have total probability 15/16, as against 
59 plants in the fourth class, with probability 1/16. The corresponding 
expected numbers are 1219.7 and 81.3. In this case the y 2 test reduces to 
that given in section 8.8 for testing a theoretical binomial proportion. We 
have 


(1242 - 1219.7) 2 (59 - 81.3) 2 


1219.7 


81.3 


( + 22.3) 2 , ( — 22.3) 3 • 

H — = o.53, 


1219.7 


81.3 


with 1 d.f. The significance probability is close to the 1% level. 

To summarize, the high value of y 2 obtained initially, 9.25 with 3 
d.f., can be ascribed to a deficiency in the number of golden-green-striped 
plants, the other three classes not deviating abnormally from the Men- 
delian probabilities. (There may be also, as Lindstrom suggests, some de- 
ficiencies in the second and third classes relative to the first class, which 
would show up more definitely in a larger sample.) 

This device of making comparisons among sub-groups of the classes 
is useful in two situations. Sometimes, especially in exploratory work, the 
investigator has no clear ideas about the way in which the numbers in the 
classes will deviate from the initial null hypothesis : indeed, he may con- 
sider it likely that his first y 2 test will support the null hypothesis. The 
finding of a significant y 2 should be followed, as in the above example, 
by inspection of the deviations to see what can be learned from them. 
This process may lead to the construction of new hypotheses that are 
tested by further y 2 tests among sub-groups of tho classes. Conclusions 
drawn from this analysis must be regarded as tentative, because the new 
hypotheses were constructed after seeing the data and should be strictly 
tested by gathering new data. 

In the second situation the investigator has some ideas about the 
types of departure that the data are likely to show from the initial null 
hypothesis ; in other words, about the nature of the alternative hypothesis. 
The best procedure is then to construct tests aimed specifically at these 
types of departure. Often, the initial y 2 test is omitted in this situation. 
This approach will be illustrated in later sections. 

When calculating y 2 with more than 1 d.f . , it is not worthwhile to 




im 

make a correction for continuity. The exact distribution of x l still 
discrete, but the number of different possible values of x l is usually large, 
so that the correction, when properly made, produces only a small change 
in the significance probability. 

EXAMPLE 9.2.1 — In 193 pairs of Swedish twins {2), 56 were of type MM (both male), 
72 of the type MF (one male, one female), and 65 of the type FF. Oft the hypothesis that a 
twin is equally likely to be a boy or a girl and that the sexes of the two members of a twin 
pair are determined independently, the probabilities of MM, MF, and FF pairs are 1/4, 1/2, 
1/4, respectively. Compute the value of y 2 and the significance probability. Am. y 3 * 1 3.27, 
with 2 df P < 0.005, 

EXAMPLE 9.2.2—- In the preceding example we would expect the null hypothesis to 
be false for two reasons. The probability that a twin is male is not exactly 1/2, This dis- 
crepancy produces only minor effects in a sample of size 193. Secondly, identical twins are 
always of the same sex. The presence of identical twins decreases the probability of MF 
pairs and increases the probabilities of MM and FF pairs. Construct f tests to answer the 
questions: (i) Are the relative numbers of MM and FF pairs (ignoring the MF pairs) in 
agreement with the null hypothesis? (ii) Are the relative numbers of twins of like sex (MM 
and FF combined) and unlike sex ( MF) in agreement with the null hypothesis? An*. Ii) y 2 
(uncorrected) = 0.67, with 1 d.f. P > 0.25, (ii) x z *» 12.44. with i df P very small. The 
failure of the null hypothesis is due, as anticipated, to an excess of twins of like sex. 

EXAMPLE 9.2.3— In section 1.14, 230 samples from binomial distributions with known 
p were drawn, and y 2 was computed from each sample. The observed and expected numbers 
of x 2 values in each of seven classes (taken from table 1.14.1) are as follows: 

Obs. ! 57 59 62 32 14 3 3 j 230 

Exp. | 57.5 57.5 57.5 34.5 11.5 9.2 2.3 ! 230.0 

Test whether the deviations of observed from expected numbers are of a size that occurs 
frequently by chance. Ans. x 2 ~ 5.50, d.f. » 6. P about 0.5. 

EXAMPLE 9.2.4 — In the Lindstrom example in the text, we had # 3 2 (3 df) » 9.25. 
This was followed by y 2 (2 df ) « 2,70, which compared the first three classes, £ndyf « 6.53, 
which compared the combined first three classes with the fourth class. Note that Xi 2 + Xi 2 
= 9.23, while yf 1 * 9.25. In examples 9.2.1 and 9.2.2, y f * 13.27, while the sum of the two 
1 -df. chi-squares is 0.67 + 12,44 » 1 3. II. When a classification is divided into sub-groups 
and a y 2 is computed within each sub-group, plus a y 2 which compares the total frequencies 
in the sub-groups, the df. add up to the d.f, in the initial y 1 , but the values of y 2 do not add 
up exactly to the initial y 2 . They usually add to a value that is fairly dose, and worth noting 
as a clue to mistakes in calculation. 

9.3 — Single classifications with equal expectations. Often, the null 
hypothesis specifies that all the classes have equal probabilities. In this 
case, x 2 h as a particularly simple form. As before, let f denote the ob- 
served frequency in the ith class, and let n ~ Ef be the total size of sample. 
If there are k classes, the null hypothesis probability that a member of the 
population falls into any class is p = l/k. Consequently, the expected 
frequency F t in any class is np = n/k = /, the mean of the/. Thus, 


1 


2 


£ (fi ~ fi) 2 

i= i Fi 


* (fi~f) 2 
& /' ’ 


with (k - 1) d.f . ; 
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This test is applied to any new table of random numbers. The basic 
property of such a table is that each digit has a probability 1/10 of being 
chosen at each draw. To illustrate the test, the frequencies of the first 
250 digits in the random number table A 1 are as follows: 


Digit 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Total 

r. 

22 

24 

28 

23 

18 

33 

29 

17 

31 

25 

250 


Only 17 sevens and 18 fours have appeared, as against 31 eights and 
33 fives . The mean frequency / =25. Thus, by the usual shortcut 
method of computing the sum of squares of deviations, £(/ — /) 2 , given 
in section 2.10, 

X 2 = y 5 [(22) 2 + (24 ) 2 + . . . + (25) 2 - (250) 2 /10] = 10.08, 

with 9 d.f. Table A 5 shows that the probability of a x 2 as large as this 
lies between 0.5 and 0.3: x 2 is not unusually large. 

This test can be related to the Poisson distribution. Suppose that 
the/ are the numbers of occurrences of some rare event in a series of k 
independent samples. The null hypothesis is that the/ all follow Poisson 
distributions with the same mean \i. Then, as shown by Fisher, the 
quantity Z(/ — f) 2 /f is distributed approximately as x 2 with (k — 1) d.f. 
To go a step further, the test can be interpreted as a comparison of the 
observed variance of the/ with the variance that would be expected from 
the Poisson distribution. In the Poisson distribution, the variance equals 
the mean //, of which the sample estimate is /. The observed variance 
among the / is s 2 = X(/ — f) 2 /{k — 1). Hence 

X 2 — (k — 1) (observed variance)/(Poisson variance) 

This x 2 test is sensitive in detecting the alternative hypothesis that 
the / follow independent Poisson distributions with different means fi v 
Under this alternative, the expected value of yf may be shown to be, 
approximately, 

E(x 2 ) = (k - 1) + £ (A - AO 2 //*. 

1=1 

where jl is the mean of the If the null hypothesis holds, [i x — ft and x 2 
has its usual average value (k — 1). But any differences among the g x 
increase the expected value of x 2 and tend to make it large. The test is 
sometimes called a variance test of the homogeneity of the Poisson dis- 
tribution. 

Sometimes the number of Poisson samples k is large. When com- 
puting the variance, time may be saved by grouping the observations, 
particularly if they take only a limited number of distinct values. To avoid 
confusion m our notation, denote the numbers of occurrences by y x in- 
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stead of/j, since we have used f’s in previous chapters to denote the fre- 
quencies found in a grouped sample. In this notation, 


1=1 y 


flhzDl 

y 


1 

y 




U« i 


where the second sum is over the m distinct values of >\ and j) is the fre- 
quency with which the/th value of y appears in the sample. The d.f. are, 
as before, {k — 1), 

If the d.f. in x 2 He beyond the range covered in table A 5, calculate 
the approximate normal deviate 


Z = VV - /2W/.) - 1 (9.31) 

The significance probability is read from the normal table, using 
only one tail. For an illustration of this case, see examples 932 and 933* 

EXAMPLE 9 3 1 -In 1951, the number of babes born with 4 harelip in Birmingham, 

England, are quoted by Edwards (3) as follows 


Month | Jan Feb Mar Apr May June July Aug Sept Oa Nov Dec 

Number 1 8 19 II 12 16 8 7 5 8 3 8 8 


Test the null hypothesis that the probability of a baby with harelip is the same in each month 
Ans. x 2 = 23 5, dj. =* 11 P between 0,025 and 001 Strictly, the variable that should 
be examined in studies of this type is the ratio (number of babies with harehp)/(toUl number 
of babies born), because even if this ratio is constant from month to month, the actual 
number of babies with harelip will vary if the total number bom vanes Edwards points out 
that in these data the total number vanes little and shows no relation to the variation m 
number with harelip. He proceeds to fit the above data by a penodic (cosine) curve, which 
indicates a maximum m March, 

EXAMPLE 9 3 2- Leggatt (4) counted the number of seeds of the weed potmtilla 
found m 98 quarter-ounce batches of the grass Phleum prae tense The 98 numbers varied 
from 0 to 7, and were grouped into the following frequency distribution 


Number of seeds 1 234567 


Number of batches f } [ 37 32 16 9 2 0 1 1 


5 Total 

— r ~” 

1 98 


Calculate x 2 ~ Ans x 2 ** 145 4, with 97 df Prom table A 5, with 

100 dj , P is clearly less than 0 005 The high value of x 2 *s due to the batches with six 
and seven seeds 

EXAMPLE 9 3 3— Compute the significance probability in the preceding example by 
finding the normal deviate Z given by equation 9 3 1 Ans z - 3 16, P - 0 0008 The cor- 
rect probability, found from a larger table of x 2 , is P « 0 00 H) 

9.4 — Additional tests. As m section 9.2, the x 2 test for the Poisson 
distribution can be supplemented or replaced by other tests directed more 
specifically against the type of alternative hypothesis that the investigator 
has in mind. If it is desired to examine whether a rare meteorological 
event occurs more frequently in the summer months, we might compare 
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the total frequency in June, July, and August with the total frequency in 
the rest of the year, the null hypothesis probabilities being very close to 
1/4 and 3/4. If a likely alternative hypothesis is that an event shows a 
slow but steady increase or decrease in frequency over a period of nine 
years, construct a variate X t = 1, 2, 3, ... 9 or alternatively —4, —3, 
—2, ... +3, +4 (making X = 0), to represent the years. The average 
change in the f per year is estimated by the regression coefficient 
E/jXj/EXj 2 , where as usual x x = X x — X. The value of x 2 for testing this 
coefficient, against the null hypothesis that there is no change, is 

x 2 = (Zfx^/fz.xi 2 , 


with 2 d.f. 

Another example is found in an experiment designed to investigate 
various treatments for the control of cabbage loopers (insect larvae) (5). 
Each treatment was tested on four plofe. Table 9.4.1 shows, for five of 
the treatments, the numbers of loopers counted on each plot. The objec- 
tive of the analysis is to examine whether the treatments produced dif- 
ferences in the average number of loopers per plot. 


TABLE 9.4.1 

Number of Loopers on 50 Cabbage Plants in a Plot 
(Four plots treated alike; five treatments) 


Treatment 

No. of Loopers 

Per Plot 

Plot 

Total 

Plot 

Mean 

X 2 

d.f. 

1 

11, 4,4, 5 

24 

6.00 

5.67 

3 

2 

6, 4,3, 6 

19 

4.75 

1.42 

3 

3 

8, 6,4,11 

29 

7.25 

3.69 

3 

4 

14, 27, 8, 18 

67 

16.75 

11.39 

3 

5 

. .. 

7, 4, 9, 14 

34 

8.50 

6.24 

3 

Total 


173 


28.41 

15 


Since the sum of a number of independent Poisson variables also 
follows a Poisson distribution (section 8.14), we can compare the treat- 
ment totals by the Poisson variance test, provided we can adopt the 
assumption that the counts on plots treated alike follow the same Poisson 
distribution. To test this assumption, the x 2 values for each treatment are 
computed in table 9.4.1 (second column from the right). Although only 
one of the five x 2 values is significant at the 5% level, their total, 28.41, 
d.f. = 15, gives P of about 0.02. This finding invalidates the use of the 
Poisson variance test for the comparison of treatment totals. Some addi- 
tional source of variation is present, which must be taken into account 
when investigating whether plot means differ from treatment to treat- 
ment. Problems of this type, which are common, are handled by the 
technique known as the analysis of variance. The analysis of these data 
is completed in example 10.3.3, p. 263. 
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Incidentally, the Poisson variance x 2 for comparing the treatment 

totals would be computed as 


z 2 = sa-/) 2 // 

= [( 24) 2 + ( 19) 2 + . . . + ( 34) 2 - ( 173) 2 /5 J / 34.6 - 41 . 5 , 

with 4 d.f. The high value of this x 2 suggests that the variation between 
treatments is substantially greater than the variation within treatments" - 
the point to be examined in the analysis of variance test. 

EXAMPLE 9 4 1 — In section 8.4, random numbers were used to draw 100 samples 
from the binomial n = 5, p = 0.2. The observed and expected frequencies (taken from 
table 8 4 1 ) are as follows 


No of Successes 

0 

1 . 2 

3 

4 

5 Total 

Observed frequency 

32 

44 17 

6 

1 

0 ! 100 

Expected frequency 

. ! 

32.77 

40.96 20.48 

5.12 

0.64 

003 10000 


Compute x 2 and test whether the deviations can be accounted for by sampling errors 
Ans. x 2 = 1 09, d.f. = 3. P about 0.75. (Combine classes 3 4, 5 before computing % l ) 


9.5 — The x 2 test when the expectations are gmmfl. The x 2 test is a 
large-sample approximation, based on the assumption that the distribu- 
tions of the observed numbers f (or y t ) in the classes are not far from 
normal This assumption fails when some or all of the observed numbers 
are very small Historically, the advice most often given was that the 
expected number in any class should not be less than 5, and that, if neces- 
sary, neighboring classes should be combined to meet this requirement. 
Later research, described in (6), showed that this restriction is too strict. 
Moreover, the combination of classes weakens the sensitivity of the x 2 
test. 

We suggest that the x 2 test is accurate enough if the smallest expecta- 
tion is at least 1 , and that classes be combined only to ensure this condition 
This recommendation applies to the x 2 tests of single classifications de- 
scribed in sections 9.2, 9.3, and 9.4. When counting the d.f. in * 2 , the 
number of classes is the number after any necessary combinations have 
been made. 

In more extreme cases it is possible to work out the exact distribution 
of x 2 - The probability that f observations fall in the ith class is given 
by the multinomial distribution 


n\ 


fi lf 2 l ••/*’• 


Pl fl P2 f2 * • * Pk fh i 


where the p t are the probabilities specified by the null hypothesis. This 
distribution reduces to the binomial distribution when there are only two 
classes. This probability is evaluated, along with the value of x 2 , for 
every possible set of f with Zf = n. 

When the expectations are equal (section 9.3), Chakravarti and Rao 
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7) have tabulated the exact 5% levels of % 2 for samples in which n = 'Lf 
< 12 and the number of classes, k, < 100. Our is their T and our k 
s their /, Their tabulated criterion (in their table 1) is our which is 
squivalent to x 2 and quicker to compute. 

EXAMPLE 9 5.1 — When 5 dice were tossed 100 times, the observed and expected 
lumbers of 2*s out of 5 were as follows (data from example 1.9.8): 


Number of 2’s 

/ 

F 

5 

2 

0.013 

4 

3 

0.322 

3 

3 

3.214 

2 

18 

16.075 

1 

42 

40 188 

0 

32 

40.188 

Total 

100 

100.000 


applying the rule that the smallest expectation should be at least 1, we would combine 
lasses 5, 4, 3 Verify that this gives y 2 ~ 7.56, d.f. — 3, P slightly above 0.05. Note that 
f we combined only the first two classes, this would give y 2 — 66.45, d.f. — 4. 

9<>6 — Single classifications with estimated expectations. In sections 
).2 and 9.3, the null hypothesis specified the actual numerical values of 
:he expectations in the classes. Often the null hypothesis gives these ex- 
pectations in terms of one or more population parameters that must be 
estimated from the sample. This is so, for instance, in testing whether 
:he observed frequencies of 0, 1,2,... occurrences will fit the successive 
;erms of a Poisson distribution. Unless the null hypothesis provides the 
/alue of fi, this must be estimated from the sample in order to calculate 
:he expected frequencies. The estimate of ji is, of course, the sample 
nean. 

The data of table 8.14.2, to which we have already fitted a Poisson 
distribution, serve as an example of the test of goodness of fit. The data 
and subsequent calculations appear in table 9.6.1 . Having obtained the 
expected frequencies, we combine the last four classes (8 or more) so as 
to reach an expectation of at least 1 . The deviations (f — F) and the 
contributions (/ — F) 2 /F to x 2 are calculated as usual and given in the 
last two columns. We find y 2 — 8.26. 

The onl> new step is the rule for counting the number of d.f. in x 2 : 
d.f. = (No. of classes) — (No. of estimated parameters) — 1 

In applying this rule, the number of classes is counted after mak- 
ing any combination of classes that is necessary because of small ex- 
pectations. Each estimated parameter places one additional restriction on 
the sizes of the deviations (/ - F). The condition that L(/ — F) = 0 
also reduces the likely size of x 2 - In this example the number of classes 
(after combining) is 9, and one parameter, /t, was estimated in fitting the 

i 



m 7 


TABLE 9.6 1 

X Test of Goodness of Fit of the Poisson Distribution, Applied to rm Numbers 
of Noxious Weed Seeds Found in 98 Batch® 



distribution. Hence, there are 9 - 1 - 1 « 7 df The P value lies be- 
tween 0.50 and 0.25. The fit is satisfactory. 

Tests of this kind, in which we compare an observed frequenc> dis- 
tribution with a theoretical distribution like the Poisson, the binomial, 
or the normal, are called goodness of fit tests. For the binomial, the df 
are 2 less than the number of classes if p is estimated from the data, and 1 
less thaii the number of classes if p is given in advance. With the normal, 
both parameters p and o are usually estimated, so that we subtract 3 from 
the number of classes. 

You now have two methods of testing whether a sample follows the 
Poisson distribution: the goodness of fit test of this section and the vari- 
ance test of section 9.3. If the members of the population actually follow 
Poisson distributions with different means, the variance test is more sensi- 
tive in detecting this than the goodness of fit test. The goodness of fit 
test is a general-purpose test, since anv type of difference between the 
observed and expected numbers, if piesent m sufficient force, makes % 2 
large. But if something is known about the nature of the alternative 
hypothesis, we can often construct a different test that is more powerful 
for this type of alternative. The same remarks apply to the binomial 
distribution. A variance test for the binomial is given in section 9.8. 

EXAMPLE 9 6 I The numbers of tomato plants attacked by spotted wilt disease 
were counted in each of 160 areas of 9 plants (8). In all, 261 plants were diseased out ot 
9 x 160 = 1440 plants A bmonnal distribution with n - 9, p - 261 1440, was fitted to the 
distribution of numbers ol diseased plants out of 9. The observed and expected numbers are 

as follows 
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No. of Diseased 








; 

Plants 

0 

1 

2 

3 

4 

5 

6 7 

Total 

Observed frequency 

36 

48 - 

38 

23 

10 

3 

1 1 

160 

Expected frequency | 

26.45 

52.70 

46.67 

24.11 

8.00 

1.77 

0.25 0,03 

159.98 


Perform the x. 2 goodness of fit test. Ans. x 2 ~ 10.28, with 4 df. after combining. 
jP < 0.05. 


EXAMPLE 9.6.2 — In a series of trials a set of r successes, preceded and followed by a 
failure, is called a run of length r. Thus the series FSFSSSF contains one run of successes 
of length 1 and one of length 3. If the probability of a success is p at each trial, the prob- 
ability of a run of length r may be shown to be p r ~ l q. In 207 runs of diseased plants in a field, 
the frequency distribution of lengths of run was as follows : 


Length of run r 

i 

2 

3 

4 

5 

Total 

Observed frequency j r 

164 

33 

9 

1 

0 

207 


The estimate of p from these data is p — (T — N)/T, where N — Zf r — 207 is the total number 
of runs and T — Trf is the total number of successes in these runs. Estimate p ; fit the dis- 
tribution, called the geometric distribution ; and test the fit by y 1 . Ans. y 2 — 0.96 with 2 df. 
P > 0.50. Note: the expression ( T — N)/T , used for estimating p , is derived from a general 
method of estimation known as the method of maximum likelihood, and is not meant to 
be obvious. The expected frequency of runs of length r is Np r ~ l Q. 

EXAMPLE 9.6.3 — In table 3.4.1 (p. 71) a normal distribution was fitted to 511 means 
of samples of pig weight gains. Indicate how you would combine classes in making a good- 
ness of fit test. How many d.f. does your y 2 have? Ans. 17 d.f. 

EXAMPLE 9.6.4 — Apply the variance test for the Poisson distribution to the data in 
table 9.6.1. Ans. y 2 = 105.3 with 97 df P > 0.25. 

9.7 — Two-way classifications. The 2 x C contingency table. We come 
now to data classified by two different criteria. The simplest case (the 
2x2 table), in which each classification has only two classes, was dis- 
cussed in section 8.10. The next simplest case occurs when one classifica- 
tion has only two classes, the other having C >2 classes. In the example 
in table 9.7.1, leprosy patients were classified at the start of an experiment 
according as to whether they exhibited little or much infiltration (a mea- 
sure of a certain type of skin damage). They were also classified into five 


TABLE 9.7.1 

196 Patients Classified According to Change in Health and Degree of Infiltration 




Change in Health 






Improvement 

Stationary 

Worse 

Total 

Degree of 







Infiltration 

Marked 

Moderate 

Slight 




Little 

11 

27 

42 

53 

11 

144 

Much 

7 

15 

16 

13 

1 

52 

Total 

18 

42 

58 

66 

12 

i 

196 
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classes according to the change in their general health during a subse- 
quent 48-week period of treatment (9). The patients did not ail receive 
the same drugs, but since no differences in the effects of these drags could 
be detected, the data were combined for this analysis. The table is called 
a 2 x 5 contingency table. 

The question at issue is whether the change in health is related to the 
initial degree of infiltration. The % 2 test extends naturally to 2 x C tables. 
The overall proportion of patients with little infiltration is 144/196. On 
the null hypothesis of no relationship between degree of infiltration and 
change in health, we expect to find (18)(144)/196 * 13.22 patients with 
little infiltration and marked improvement, as against 1 1 observed. As 
before, the rule for finding an expected number is (row total)(column 
total)/(grand total). The expected numbers F and the deviations (f - F) 
are shown in table 9.7.2. Note that only four expected numbers need be 
calculated : the rest can be found by subtraction. 

TABLE 9.7.2 

Expected Numbers and Deviations Calculated From Table 9.7. 1 


Change in Health 


Degree of 
Infiltration 

Improvement 

Stationary 

Worse 

; Total 

Marked 

Moderate Slight 

i 


Expected numbers, F 



Little 

13.22 

30.86 42.61 

48,49 

8.82 

144.00 

Much 

, - J 

4.78 

11.14 15.39 

17.51 

3.18 

| 52 .00 

Total 

18.00 

42.00 58.00 

66.00 

12.00 j 196.00 

1 


Deviations , (/- 

F) 

i 

Little j 

- 2.22 

-3.86 -0.61 

+ 4.51 

+ 2,18 

! o.oo 

Much j 

+ 2.22 

+ 3.86 +0.61 

— 4.51 

- 2,1 8 

; 0.00 


The value of y 2 ,is 
X 2 = X(/~ F) 2 /F 

= ( — 2.22) 3 /l 3.22 + ( + 2.22) 2 /4.78 + ... + (- 2 . 18 + 3.18 = 6 . 87 . 

taken over the ten cells in the table. The number of d.j. is (R — 1 )(C — 1 ), 
where R, C are the numbers of rows and columns, respectively. In this 
example R = 2, C = 5 and we have 4 d.f. This rule for d.f. is in line with 
the fact that when four j>f the deviations in a row are known, all the rest 
can be found. With x*~= 6.87, d.f. = 4, the probability lies between 0.25 
and 0.10. 

Although this test has not rejected the null hypothesis, the devia- 
tions show a systematic pattern. In the “much infiltration" class, the ob- 
served numbers are higher than expected for patients showing any degree 
of improvement, and lower than expected for patients classified as sta- 



240 Chapter 9: Attribute Data with more than One Degree of Freedom 

tionary or worse. The reverse is, of course, true for the “little infiltration” 
class; Contrary to the null hypothesis, these deviations suggest that 
patients with much infiltration progressed on the whole better than those 
with little infiltration. This suggestion will be studied further in section 
9.10. 

£.8 — The variance test for homogeneity of the binomial distribution. 
In the preceding example we obtained a 2 x C contingency table because 
the data were classified into 2 classes by one criterion and into C classes by 
a second criterion. Alternatively, we may have recorded some binomial 
variate p, = ajn , in each of C independent samples, where i goes from 
1 to C and n, is the size of the ith sample. The objective now is to examine 
whether the true p, vary from sample to sample. Data of this type occur 
very frequently. 

A quicker method of computing x 2 which is particularly appropriate 
in this situation was devised by Snedecor and Irwin (10). It will be illus- 
trated by the preceding example. Think of the columns in table 9.8.1 
as representing C = 5 samples. 


TABLE 9 8.1 

Alternative Calculation of x 2 for the Data in Table 9 7 1 


1 

Degree of 
Infiltration* 

Improvement 


Stationary 

Worse 

Total 

Marked 

Moderate 

Slight 

Little 

11 

27 

42 

53 

11 

144 

Much O,) 

7 

15 

16 

13 

1 

52 (A) 

Total (n t ) 

18 

42 

58 

66 

12 

196 (JV) 

Pi = «(/«! 

0 3889 

0.3571 

0.2759 

0 1970 

0.0833 

0 26531 (p) 


First calculate the proportion p l = ajn, of “much infiltration” pa- 
tients in each column, and the corresponding overall proportion p = A/N 
= 52/196 = 0.26531. Then, 

X 2 = (Spa - pA)/pq 

= [(0.3889)(7) + . . . + (0.0833X1) 

— (0.26531)(52)]/(0.26531)(0.73469) 

= 6.88, (9.8 1) 

as before, with 4 df. 

If p x is the variable of interest, you will want to calculate these values 
anyway in order to examine the results. Extra decimals should be earned 
to ensure accuracy m computing x 2 , particularly when the a x are large. 
The computations are a little simpler when the p x are denved from the row 
with the smaller numbers. 

This formula for x 2 can be written, alternatively, 

x 1 = Sw,(p, -P) 2 /M 


(9.8.2) 
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If the binomial estimates p t are all based on the same sample size n, 
X 2 becomes 

c 

x 2 = I ( P , - p) 2 f(pq/n) ~(C - l)s p 2 m/n) (9.8.3) 

I as 1 

In this form, % 2 is essentially a comparison of the observed variance 
s p 2 among thep, with the variance pq/n that the p t would have if they were 
independent samples from the same binomial distribution. The same 
interpretation can be shown to apply to expression (9.8.2) for x 2 ■ A high 
value of x 2 denotes that the true proportions differ from sample to sample. 

This test, sometimes called the variance test for homogeneity of the 
binomial distribution, has many applications. Different investigators 
may have estimated the same proportion in different samples, and we 
wish to test whether the estimates agree, apart from sampling errors. In 
a study of an attribute in human families, where each sample is a family, 
a high value of x 2 indicates that members of the same family tend to be 
alike with regard to this attribute. 

When some of the sample sizes «, are small, some of the expectations 
nj> and n t q will be small. The x 2 test can still be used with some expecta- 
tions as low as 1, provided that most of the expectations (say 4 out of 5) 
are substantially larger. (Recent results [11] suggest that this advice is 
conservative.) In some genetic and family studies, all the n t are small. 
For this case a good approximation to the significance levels of the exact 
X 2 distribution has been given by Haldane (12), though the computations 
are laborious. When x 2 has more than 30 d.f. and the n, are all equal 
(=n) the exact x 2 is approximately normally distributed with 


Mean = (C - 1 )N/(N - 1) 

Variance = 2 (C - l )( 2 )(jv - 1) 2 (N - 2)(JV - 3) [* ~ 


= 2(C - 



where C is the number of samples and N » Cn. 

When the p x vary from column to column, as indicated by a high 
value of x 2 y the binomial formula <J(pq/N) underestimates the standard 
error of the overall proportion p for the combined sample. A more 
nearly correct formula (section 17.5) for the standard error of p in this 


situation is 


s.e. (p) = ~ - 2pZa i n i + pZrfVQC - 1), 


where C is the number of samples and 
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EXAMPLE 9.8.1 — Ten samples of 5 mice from the same laboratory were injected with 
the same dose of bact. typhi. murium (13). The numbers of mice dying (out of 5) were as 
follows: 3, 1, 5, 5, 3, 2, 4, 2, 3, 5. Test whether the pioportion dying can be regarded as 
constant from sample to sample. Ans. x 2 — 18.1, d.f = 9. P < 0.05. Since the death rate 
is found so often to vary within the same laboratory, a standard agent is usually tested along 
with each new agent, because comparisons made over time cannot be trusted. 


EXAMPLE 9.8.2 — Uniform doses of Danysz bacillus were injected into rats, the sizes 
of the samples being dictated by the numbers of animals available at the dates of injection. 
These sizes, the numbers of surviving rats, and the proportion surviving, are as follows : 


Number in sample 

40 

12 

22 

11 

37 

20 

Number surviving 

9 

2 

3 

1 

2 

3 

Proportion surviving 

0.2250 

0.1667 

0.1364 

0.0909 

0.0541 

0.1500 


Test the null hypothesis that the probability of survival is the same in all samples. Ans. 
X 2 = 4.97, d.f. = 5, P = 0.43. 

EXAMPLE 9.8.3 — In another test with four samples of inoculated rats, x 2 was 6.69, 
P — 0.086. Combine the values of x 2 for the two tests. Ans. x 2 — 1 1 -66, d.f. = 8, P = 0.17. 

EXAMPLE 9.8.4 — Burnett (14) tried the effect of five storage locations on the viability- 
of seed corn. In the kitchen garret, 1 1 1 kernels germinated among 120 tested; in a closed 
toolshed, 55 out of 60; m an open toolshed, 55 out of 60; outdoors, 41 out of 48; and in a 
dry garret, 50 out of 60. Calculate x 2 ~ 5.09, df = 4 , P — 28%. 

EXAMPLE 9.8.5 — In 13 families in Baltimore, the numbers of persons («,) and the 
numbers (a,) who had consulted a doctor during the previous 12 months were as follows: 
7, 0; 6, 0; 5", 2; 5, 5; 4, 1 ; 4, 2; 4, 2; 4, 2; 4, 0; 4, 0; 4, 4;4, 0; 4, 0. Compute the overall per- 
centage who had consulted a doctor and the standard error of the percentage. Note : One 
would expect the proportion who had seen a doctor to vary from family to family. Verify 
this by finding x 2 = 35.6, d.f. = 12, P < 0.005. Consequently, formula 9.8.4 is used to 
estimate the s e of p Ans Percentage = lOOp = 30.5%, s.e. = 10 5% (These data were 
selected from a large sample for illustration.) 

9.9 — Further examination of the data. When the initial x 2 test shows 
a significant value, the remarks made in section 9.2 about further examina- 
tion of the data apply here also. Subsequent tests are made that may help 
to explain the high value of x 2 Frequently, as already remarked, the in- 
vestigator proceeds at once to these tests, omitting the initial x 2 test as not 
informative. 

Decker and Andre (15) investigated the effect of a short, sudden ex- 
posure to cold on the adult chinch bug. Since experimental insects had 
to be gathered in the field, the degree of heterogeneity in the insects was 
unknown, and the investigators faced the problem as to whether they 
could reproduce their results. Ten adult bugs were placed in each of 50 
tubes and exposed for 1 5 minutes at — 8°C. For this illustration the counts 
of the numbers dead in the individual tubes were combined at random 
into 5 lots of 10 tubes each; that is, into lots of 100 chinch bugs. The 
numbers dead were 14, 14, 23, 17, and 20 insects. From these data, 
X 2 = 4.22, d.f. ~ 4, P = 0.39. The results are in accord with the hy- 
pothesis that every adult bug was subject to the same chance of being 
killed by the exposure 
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In a second sample of 500 adults, handled in the same manner except 
that they were exposed at — 9°C., the numbers dead in groups of 1 00 were 
38, 30, 30, 40, 27. The % 2 value of 5.79 again verifies the technique, 
showing only sampling variation from the estimated mortality of 33%. 

The gratifying uniformity in the results leads one to place some con- 
fidence in the surprising finding that the death rates at - 8‘ C. and -9C. 
were markedly different. The total numbers dead in the two samples 
of 500 were 88 and 165. The result, y 1 = 31.37 with d.f. «* 1, P less than 
0.0002, provides convincing evidence that a rise in mortality with the 
lowering of temperature from -8°C. to - TC. is a characteristic of the 
population, not merely an accident of sampling. 

The ease of applying a test of experimental technique makes its use 
almost a routine procedure except in highly standardized processes. It is 
necessary merely to collect the data in several small groups, chosen with 
regard to the types of experimental variation thought likely to be present, 
instead of in one mass. The additional information may modify conclu- 
sions and subsequent procedures profoun ’ly. 

In this example the sum of the three values ofy 2 is 4.22 + 5.79 + 3137 
= 41.38, with 9 d.f. If the initial x 2 is calculated from the 2 x 10 con- 
tingency table formed by the complete data, its value is also found to be 
41.38, with 9 d.f. This agreement between the two values is a fluke, which 
does not hold generally in 2 x C tables. For 2 x C and RxC tables, a 
method of computing the component parts so that they add to the initial 
total x 2 is available (16). In these data this method amounts to using the 
same denominator pq = (0.253)(0.747), calculated from the total mortal- 
ity, in finding all x 2 values. Instead, for the 4 d.f. x 2 at -8'C. we used 
pq = (0.176)(0.824), appropriate to that part of the data, and at ~9 U C 
we used pq = (0.330)(0.670). The additive x 2 values give 3.24 + 6.77 
+ 31.37 = 41.38. However, when it has been shown that the mortality 
differs at — 8°C. and -9°C., use of a pooled p for the individual homo- 
geneity tests at -8°C. and -9°C. is invalid. The non-additive method 
is recommended, except in a quick preliminary look at the data. 

9.10 — Ordered classifications. In the leprosy example of section 9.7, 
the classes (marked improvement, moderate improvement, slight im- 
provement, stationary, worse) are an example of an ordered classification. 
Such classifications are common in the study of human behavior and 
preferences, and more generally whenever different degrees of some phe- 
nomenon can be recognized. The problem of utilizing the knowledge 
that we possess about this ordering has attracted considerable attention 
in recent years. 

With a single classification of Poisson variables, the ordering might 
lead us to expect that if the null hypothesis ft, = ft does not hold, an alterna- 
tive pi < P 2 — R 3 — should hold, where the subscripts represent the 
order. For instance, if working conditions in a factory have been classi- 
fied as Excellent, Good, Fair, we might expect that if the number of defec- 
tive articles per worker varies with working conditions, the order should 
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be jx x < [i 2 <> fi$. Similarly, with ordered columns m a 2 x C contingency 
table, the alternative/?! £ /? 2 < /? 3 < might be expected, x 2 tests designed 
to detect this type of alternative have been developed by Bartholomew 
(17). The computations are quite simple. 

Another approach, used by numerous workers (9), (18), (19), is to 
attach a score to each class so that an ordered scale is created. To illus- 
trate from the leprosy example, we assigned scores of 3, 2, 1, respectively, 
to the Marked, Moderate, and Slight Improvement classes, 0 to the Sta- 
tionary class, and — 1 to the Worse class. These scores are based on the 
judgment that the five classes constructed by the expert represent equal 
gradations on a continuous scale. We considered giving a score of 4-4 
to the Marked Improvement class and —2 to the Worse class, since the 
expert seemed to examine a patient at greater length before assigning him 
to one of these extreme classes, but rejected this since our impression may 
have been erroneous. 

Having assigned the scores we may think of the leprosy data as 
consisting of two independent samples of 144 and 52 patients, respec- 
tively. (See table 9.10.1.) For each patient we have a discrete measure 
X of his change in health, where X takes only the values 3, 2, 1,0, — 1. 
We can estimate the average change m health for each sample, with its 
standard error, and can test the null hypothesis that this average change is 
the same in the two populations. For this test we use the ordinary two- 
sample /-test as applied to grouped data. The calculations appear in 
table 9.10.1. On the X scale the average change in health is + 1.269 for 
patients with much infiltration and -b 0.8 19 for those with little infiltration. 
The difference, D, is 0.450, with standard error ±0.172 (194 d.f.\ com- 
puted in the usual way. The value of t is 0.450/0.172 = 2.616, with 
j P < 0.01. Contrary to the initial x 2 test, this test reveals a significantly 
greater amount of progress for the patients with much infiltration. 

The assignment of scores is appropriate when (i) the phenomenon 
in question is one that could be measured on a continuous scale if the 
instruments of measurement were good enough, and (li) the ordered classi- 
fication can be regarded as a kind of grouping of this continuous scale, or 
as an attempt to approximate the continuous scale by a cruder scale that is 
the best we can do in the present state of knowledge. The process is 
similar to that which occurs m many surveys. The householder is shown 
five specific income classes and asked to indicate the class within which 
his income falls, without naming his actual income. Some householders 
name an incorrect class, just as an expert makes some mistakes in classi- 
fication when this is difficult 

The advantage in assigning scores is that the more flexible and power- 
ful methods of analysis that have been developed for continuous variables 
become available. One can begin to think of the sizes of the average 
differences between different groups in a study, and compare the dif- 
ference between groups A and B with that between groups E and F. 
Regressions of the group means X on a further variable Z can be worked 
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Analysis of the Leprosy Data by Assigned Scores 
(Data with assigned scores) 
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Change m 

\ 

Infiltration 

Health 

Little 

Much 



No. of patients 

X 

/ 

/ 

3 

11 

7 

2 

27 

15 

l 

42 

16 

0 

53 

13 


11 

I 

Total: Zf 

144 

, . ...... .. . 

52 


(Computations) 


Little Much 

ZfX IIS 66 

J = ZfXflf 0.819 1 269 

ZfX 2 260 140 

(ZfX) 2 /Zf 96.7 83.8 


Zfx 2 

If 

s 1 

Pooled s 2 


1633 
143 
1 142 


s s 

t = 


0.131) 


a + iv 

\I44 52/ 


0,172 

D 1.269 - 0,819 
s n ~ 0.172 

<*/. » 194, P < 0.01 


1,131 
■ 0.0296 

2.616 


56.2 

51 

1 102 


out. The relative variability of different groups can be examined by 
computing 5 for each group. 

This approach assumes that the standard methods of analysis of 
continuous variables, like the f-test, can be used with an X variable that 
is discrete and takes only a few values. As noted in section 5.8 on scales 
with limited values, the standard methods appear to work well enough for 
practical use. However, heterogeneity of variance and correlation be- 
tween s 2 and X are more frequently encountered because of the discrete 
scale. If most of the patients in a group show marked improvement, 
most of their JTs will be 3, and s 2 will be small. Pooling of variances 
should not be undertaken without examining the individual s 2 . In the 
leprosy example the two s 2 were 1.142 and 1.102 (table 9,10.1), and this 
difficulty was not present. 
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The chief objection to the assignment of scores is that the method 
is more or less arbitrary. Two investigators may assign different scores 
to the same set of data. In our experience, however, moderate differences 
between two scoring systems seldom produce marked differences in the 
conclusions drawn from the analysis. In the leprosy example, the alterna- 
tive scores 4, 2, 1, 0, —2 give t = 2.549 as against t = 2.616 in the analysis 
in table 9,10.1. Some classifications present particular difficulty. If the 
degrees of injury to persons in accidents are recorded as slight, moderate, 
severe, disabling, and fatal, there seems no entirely satisfactory way of 
placing the last two classes on the same scale as the first three. 

Several alternative principles have been used to construct scores. In 
studies of different populations of school children, K. Pearson (20) as- 
sumed that the underlying continuous variate was normally distributed 
in a standard population of school children. If the classes are regarded as 
a grouping of this normal distribution, the class boundaries for the normal 
variate are easily found. The score assigned to a class is the mean of the 
normal variate within the class. A related approach due to Bross (21) also 
uses a standard population but does not assume normality. The score 
(ridit) given to a class is the relative frequency up to the midpoint of that 
class in the standard population. When the experimental treatments are 
different doses of a toxic or protective agent in biological assay, Ipsen (22) 
shows how to assign scores so that the resulting variate has a linear regres- 
sion on some chosen function of the dose, the ratio of the variance due 
to regression to the total variance being maximized. Fisher (23) assigns 
scores so as to maximize the F- ratio of treatments to experimental error 
as defined in section 10.5. The maximin method of Abelson and Tukey 
(24), maximizes the square of the correlation coefficient r 2 between the 
assigned scores and the set of true scores, consistent with the investigator’s 
knowledge about the ordering of the classes, that gives a minimum cor- 
relation with the assigned scores. This approach, like Bartholomew’s, 
avoids any arbitrary assumptions about the nature of the true scale. 

EXAMPLE 9 10 1 — In the leprosy data, verify the value of t — 2.549 quoted for the 
scoring 4, 2, 1, 0, — 2 

9.11 — Test for a linear trend in proportions. When interest is centered 
on the proportions p t in a 2 x C contingency table, there is another way 
of viewing the data. Table 9.1 1.1 shows the leprosy data with the assigned 
scores X v but in this case the variable that we analyze is p l9 the proportion 
of patients with much infiltration. The contention now is that if these 
patients have fared better than patients with little infiltration, the values 
of pi should increase as we move from the Worse class (X = — 1) towards 
the Marked Improvement class ( X = 3). 

If this is so, the regression coefficient of p t on X x should be a good test 
criterion. On the null hypothesis (no relation between p { and X t ) each p t 
is distributed about the same mean, estimated by p, with variance pq/n v 
The regression coefficient b is calculated as usual, except that each p t 
must be weighted by the reciprocal of the sample size on which it is 



TABLE 9.1 1 I 

Testing a Linear Regression of p ( on the Score {tmwm Data) 


Degree of 
Infiltration 


Improvement 

Stationary 

5M 

Worse 

; total 

i 

Marked 

Moderate 

Slight 

Little 

11 

27 

42 

53 

11 

144 

Much (a t ) 

7 ^ 

15 

16 

13 

1 

52 

Total {n ( ) 

18 

42 

58 

66 

12 

196 (N) 

Pi = a,/«t . 

0.3889 

0.3571 

0.2759 

0.1970 


0 2653 (p) 

Score Xi 

3 

2 

1 

0 

-1 



based. The numerator and denominator of b are computed as follows ; 

Num. = Into - pjiXi - X) 

= ZriiPi X, - (Zn^HZn^/Tn, 

- ZcitXi - ('La i )(Zn i X i )fN 

- 66 - ( 52 )( 184)/196 * 66 - 48.82 - 17.18 

Den. = - (Zn^^JN 

= 400- {184)7196 = 400 - 172.8 « 227.2 
This gives h = 17.18/227.2 = 0.0756. Its standard error is 

s b = J(pq/Den.) - x /{(0.2653)(0.7347)/(227.2)} * 0.0293 
The normal deviate for testing the null hypothesis p » 0 is 
Z = = 0.0756/0.0293 = 2.580. P » 0.0098. 

Although it is not obvious at first sight, Yates (18) showed that this 
regression test is essentially the same as the /-test in section 9.10 of the 
difference between the mean scores in the Little and Much infiltration 
classes. In this example the regression test gave Z = 2.580 while the 
/-test gave / = 2.616 (194 <£/.). The difference in results arises because the 
two approaches use slightly different large-sample approximations to the 
exact distributions of Z and / with these discrete data. 


EXAMPLE 9.1 LI Armitage (19) quotes the following data by Holmes and Williams 
for the relation in children between size of tonsils and the proportion of children who are 
carriers of streptococcus pyogenes m the nose 


Types of Children 

V = Score Given to Size of Tonsils 

0 1 2 

Total Children 

Carriers (a { ) 

19 

29 

24 ! 

72(4) 

Non-earners 

497 

560 

269 

- j 

1326 

Total (n t ) 

516 

589 

293 

1398 (N) 

Carrier-rate (/?,) 

0.0368 

0.0492 

0.0819 

0.051502 (p) 
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Calculate: (i) the normal deviate Z for testing the linear regression of the proportion of car- 
riers on size of tonsils, (ii) the value of t for comparing the difference between the mean size 
of tonsils in carriers and non-carriers. Ans. (i) Z = 2.681, (it) t — 2.686, with 1396 df 

EXAMPLE 9.11.2 — When the regression of on X ( is used as a test criterion, it is of 
interest to examine whether the regression is linear. Armitage ( 19 ) shows that this can be 
done by first computing x 2 * E«i(p,- “ P) 2 !m = {^ a iPi ~ A 2 /iV}/pg. This yf, with (C — 1) 
df, measures the total variation among the C values of p { . The x 2 for linear regression, with 
1 d.f., is found by squaring Z, since the square of a normal deviate has a x 2 distribution with 
1 df The difference, X(c- 1 ) 2 — Xi 2 > is a x 2 with (C - 2) d.f. for testing the deviations of the 
pi from their linear regression on the X { . Compute this x 2 for the data in example 9.1 1.1. 
Ans. The total x 2 is 7.85 with 2 df , while Z 2 is 7.19 with 1 df Thus the % 2 for the devia- 
tions is 0.66 with 1 df , in agreement with the hypothesis of linearity. 


9.12 — Heterogeneity % 2 “ testing Mendelian ratios. It is often ad- 
visable to collect data in several small samples rather than in a single large 
one. An example is furnished by some experiments on chlorophyll in- 
heritance in maize (1), reported in table 9.12.1. The series consisted of 
1 1 samples of progenies of heterozygous green plants, self-fertilized, segre- 
gating into dominant green plants and recessive yellow plants. The hypo- 
thetical ratio is 3 green to 1 yellow. We shall study the proportion of 
yellow — theoretically 1/4. 

TABLE 9.12.1 

Number of Yellow Seedlings in 1 1 Samples of Maize 


No. in Sample 

No. Yellow 

Proportion Yellow 

n t 

“i 

Pt = “i/n, 

122 

24 

0.1967 

149 

39 

0.2617 

86 

18 

0.2093 

55 

13 

0.2364 

71 

17 

0.2394 

179 

38 

0.2123 

150 

30 

0.2000 

36 

9 

0.2500 

91 

21 

0.2308 

53 

14 

0.2642 

111 

26 

0.2342 

ff = 1103 A = 249 

Heterogeneity x 2 (10 d.f.) 

X 2 = (Ia,p, - Ap)/pq = (0.5779)/(0.2258)(0.7742) = 3.31 
Pooled x 2 (1 d.f.) 

Xc 2 = (M - N P\ ~ l)7 Npq 

= ((249 - 275.75| - $) J /(1 1Q3)(0.25)(0.75) = 3.33 

p = 0.22575 


The data may fail to satisfy the simple Mendelian hypothesis in two 
ways. First, there may be real differences among the p t (proportion of 
yellow) in different samples. This finding points to some additional source 
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of variability that must be explained before the data can be used as a 
crucial test of the Mendelian ratio. Second, the p t may agree with one 
another (apart from sampling errors) but their overall proportion p may 
disagree with the Mendelian proportion p. The reason may be linkage or 
crossing-over, or differential robustness in the dominant and recessive 
plants. 

The first point is examined by applying to the p t the variance test for 
homogeneity of the binomial distribution (section 9.8). The value of y 1 
shown under table 9.12.1, is 3.31, with 10 d.f., P about 0.97. The test 
gives no reason to suspect real differences among the p t . We therefore 
pool the samples and compare the overall ratio, p — 0.22575, with the 
hypothetical p - 0.25, by the y 2 test for a binomial proportion (section 
8.8). We find y 2 (corrected for continuity) = 3.33, P about 0.07. There 
is a hint of a deficiency of the recessive yellows. 

In showing the relation between these two tests, the following alge- 
braic identity is of interest : 

I n i(Pi - p ) 2 = (Z" f )(p-p ) 2 + Z «<(p. - p ) 2 (9 12 

pq pq pq 

The quantity nfpi — p) 2 /pq measures the discrepancy between the ob- 
served p i in the ith sample and the theoretical value p. If the null hypothe- 
sis is true, this quantity is distributed as y 2 with 1 df and the sum of these 
quantities over the C samples (left side of equation 9.12.1) is distributed 
as x 2 with C df The first term on the right of (9. 1 2. 1 ) compares the pooled 
ratio p with p, and is distributed as x 2 with 1 df The second term on the 
right measures the deviations of the p t from their own pooled mean p, 
and is distributed as x 2 with (C — 1) df. To sum up, the total x 2 on the 
left, with C df, splits into a x 2 with 1 df. which compares the pooled 
sample p and the theoretical p, and a heterogeneity y 2 , with (C — 1) df, 
which compares the p x among themselves. These x 2 distributions are of 
course followed only approximately unless the «, are large. 

In practice, this additive feature is less useful. Unless the pooled 
sample is large, a correction for continuity in the 1 df for the pooled % 2 
is advisable. This destroys the additivity. Secondly, the expression for 
the heterogeneity y 2 assumes that the theoretical ratio p applies in these 
data. If there is doubt on this point, the heterogeneity y 2 should be 
calculated, as in table 9.12.1, with pq in the denominator instead of pq. 
In this form the heterogeneity y 2 involves no assumption that p = p 
(apart from sampling errors). 

EXAMPLE 9.12.1 — From a population expected to segregate 1:1, four samples with 
the following ratios were drawn, 47:33, 40:26, 30:42, 24:34. Note the discrepancies 
among the sample ratios. Although the pooled x 2 does not indicate any unusual departure 
from the theoretical ratio, you will find a large heterogeneity x 1 equal to 9.01, P « 0.03, for 
which some explanation should be sought. 
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EXAMPLE 9 12 2 — Fisher (25) applied x 2 tests to the experiments conducted by 
Mendel m 1863 to test different aspects of his theory, as follows 


Expenment 

x 2 

df 

Trifactonal 

8 94 

17 

Bifactorial 

2 81 

8 

Gametic ratios 

3 67 

15 

Repeated 2 1 test 

0 13 

I 


Show that m random sampling the probability of obtaining a total x 2 lower than that ob- 
served is less than 0 005 (use the / 2 table) More accurately, the probability is less than 1 in 
2000 Thus the agreement of the results with Mendel’s laws looks too good to be true 
Fisher gives an interesting discussion of possible reasons 

9.13 — The RxC table. If each member of a sample is classified by 
one charactenstic mto R classes, and by a second characteristic into C 
classes, the data may be presented m a table with R rows and C columns 
The entry m any of the RC cells is the number of members of the sample 
falling mto that cell Strand and lessen (26) classified a random sample 
of farms in Audubon County, Iowa, mto three classes (Owned, Rented, 
Mixed), according to the tenure status and mto three classes (I, II, III), 
according to the level of the soil fertility (table 9 13 1) 


TABLE 9 13 1 

Numbers of Farms on Three Soil Fertility Groups in Audubon County, Iowa, 
Classified According to Tenure 


Soil 


Owned 

Rented 

Mixed 

Total 

I 

/ 

36 

67 

49 

152 


j F 

36 75 

62 92 

52 33 



f-F 

-0 75 

4 08 

-3 33 


II 

f 

31 

60 

49 

- 140 


F 

33 85 

57 95 

48 20 



f-F 

-2 85 

2 05 

0 80 


III 

f 

58 

87 

80 

225 


F 

54 40 

93 13 

7747 



f-F 

3 60 

-6 13 

* i 

2 53 ; 


Total 


125 

214 

178 

517 


2 = y (/--O 2 = (-Q75) 2 
X L F 36 75 


(2 S3) 2 
+ 77 47 


1 54 df ={R- 1)(C — 1) = 4 


Before drawing conclusions about the border totals for tenure status, 
this question is asked Are the relative numbers of Owned, Rented, and 
Mixed farms m this county the same at the three levels of soil fertility 9 
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This question might alternatively be phrased Is the distribution of the 
soil fertility levels the same for Owned, Rented, and Mixed farms 7 (If a 
little reflection does not make it clear that these two questions are equiva- 
lent, see example 9 13 1) Sometimes the question is put more succinctly 
as Is tenure status independent of fertility level 7 

The x 2 test for the 2 x C table extends naturally to this situation. 
As before, 

X 2 = £(/ ~ F) 2 /F , 

where / is the observed frequency m any cell and Fthe frequency expected 
if the null hypothesis of independence holds 

As before, the expected frequency for any cell is computed from the 
border totals in the corresponding row and column 

(row total)(column total) 

j r = — 

n 

row total , , 

= (column total) 

n 


Examples For the first row. 


row total 152 
n = 517 


* 0 29400 


F x = (0 29400)(125) = 36 75 
F 2 = (0 29400) (2 14) » 62 92 
F 3 = (0 29400) (178) = 52 33 

This procedure makes the computation easy with a calculating machine 
For verification, notice that (i) the sum of the Fm any row or column is 
equal to the observed total, and consequently (n) the sum of the deviations 
in each row and in each column is zero 

The facts just stated dictate the number of degrees of freedom One 
is fiee to put R — 1 expected frequencies m a column,, but the remain- 
ing cell is then fixed as the column total minus the sum of the R - 1 values 
of F Similarly, when we have inserted expected frequencies m this way 
m (C — 1) columns, the expected frequencies in the last column are fixed 
Therefore df == (jR — 1 )(C — 1 ) 

The calculation of x 2 is given in the table Since P > 0 8, the null 
hypothesis is not rejected if you do not need to examine the contribution 
of the individual cells ot / 2 , up to half the time m computation can be 
saved by a shortcut devised by P H Leslie (27) This is especially useful 
if manv tables are to be calculated 

When x 2 is significant, the next step is to study the nature of the de- 
parture from independence in more detail Examination of the cells m 
which the contribution to x 2 is greatest, taking note of the signs of the 
deviations (f — F), furnishes clues, but these are hard to interpret because 
the deviations in different cells are correlated Computation of the per- 
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centage distribution of the row classification within each column, fol- 
lowed by a scrutiny of the changes from column to column, may be more 
informative. Further x 2 tests may help. For instance, if the percentage, 
distribution of the row classification appears the same in two columns, a 
X 2 test for these two columns may confirm this. The two columns can: 
then be combined for comparison with other columns. Examples 9. 1 3.2, 
3, 4, 5 illustrate this approach. 

EXAMPLE 9.13.1 — Show that if the expected distribution of the column classification 
is the same m every row, then the expected distribution of the row classification is the same 
m every column. For the rth row, let F a , F i2 , . . . F iC be the expected numbers m the respec- 
tive columns. Let F a ~ a 2 F iu F i3 = a 2 F iu . . . F iC ~ a c F n . Then the numbers a 2 , u 3 , . . . a c 
must be the same in every row, since the expected distribution of the column classification 
is the same m every row. Now the expected row distribution m the first column is F u , 
F 21 , . . F rv . In the second column it is F 12 = a 2 F x u F 22 ~ a 2 F 21 , . . F R2 = a 2 F R1 . Since 
a 2 is a constant multiplier, this is the same distribution as in the first column, and similarly 
for any other column. 

EXAMPLE 9. 1 3.2— In a study of the relation between blood type and disease, large 
samples of patients with peptic ulcer, patients with gastric cancer, and control persons free 
from these diseases were classified as to blood type (0, 4, AB) In this example, the 
relatively small numbers of AB patients were omitted for simplicity. The observed numbers 
are as follows : 


Blood Type 

! 

Peptic Ulcer 

Gastric Cancer 

Controls 

Totals 

0 

983 

383 

2892 

4528 

A 

679 

416 

2625 

3720 

B 

134 

84 

570 

788 

Totals 

1796 

883 

6087 

8766 


Compute x 2 to test the null hypothesis that the distribution of blood types is the same for 
the three samples. Ans. y 1 - 40.54, 4 df F very small. 


EXAMPLE 9.13.3 — To examine this question further, compute the percentage dis- 
tribution of blood types for each sample, as shown below. 


Blood Type 

Peptic Ulcer 

Gastric Cancer 

Controls 

O 

54.7 

43.4 

47.5 

A 

37.8 

47.1 

43.1 

B 

7.5 

9.5 

9.4 

Totals 

100.0 

100.0 

100.0 


This suggests (i) there is little difference between the blood type distributions for gastnc 
cancer patients and controls, (ii) peptic ulcer patients differ principally in having an excess of 
patients of type O. Going back to the frequencies in example 9.13.2, test the hypothesis 
that the blood type distribution is the same for gastric cancer patients and controls. Ans. 
X 2 = 5.64 (2 d.f.). P about 0.06. 

EXAMPLE 9.1 3.4 — Combine the gastric cancer and control samples. Test (i) whether 
the distribution of A and B types is the same in this combined sample as in the peptic ulcer 
sample (omit the O types). Ans. x 2 ~ 0-68 (1 d.f.) P > 0.7. (ii) Test whether proportion 


k 





253 


of G types versus A + B types is the same for the combined sample as for the gastric cancer 
samples. Ans. % 2 = 34.29 (1 d.f.). P very small. To sum up, the high value of the original 
4 d.f. x 2 is due primarily to an excess of O types among the peptic ulcer patients. 


EXAMPLE 9.13.5 — The preceding y 2 tests may be summarized as follows: 


Comparison 

d.f. 

z 2 

0, A, B types in gastric cancer (g) and controls (c) 

2 

5.64 

A, B types in peptic ulcer and combined (g, c) 

1 

0.68 

O, A and B types in peptic ulcer and combined (g, c) 

1 

34.29 

Total 

4 

40.61 


The total yf, 40.61 , is close to the original y 2 , 40.54, because we have broken down the original 
4 d.f. into a senes of independent operations that account for all 4 d.f The difference be- 
tween 40.61 and 40.54, however, is not just a rounding error : the two quantities differ a little 
algebraically. 

9.14 — Sets of 2 x 2 tables. Sometimes the task is to combine the 
evidence from a number of 2 x 2 tables. The same two treatments or 
types of subject may have been compared in different studies, and it is 1 
desired to summarize the combined data. Alternatively, the results of a 
single investigation are often subclassified by the levels of a factor or 
variable that is thought to influence the results. The data in table 9.14. 1 , 
made available by Dr. Martha Rogers (in 9), are pf this type. 

The data form part of a study of the possible relationship between 
complications of pregnancy of mothers and behavior problems in children. 
The comparison is between mothers of children in Baltimore schools who 
had been referred by their teachers as behavior problems and mothers of 
control children not so referred. For each mother it was recorded whether 


TABLE 9.14.1 

A Set of Three 2x2 Tables: Numbers of Mothers With Previous Infant Losses 


Birth 

Order 

Type of 
Children 

No. of Mothers with : ; 

Total 

% Loss 

i 

X 2 (1 d.f.) 

Losses No Losses 

2 ! 

Problems 

Controls 

20 82 

! 10 54 

102 = ft* ! 

64 = n 12 ! 

19.6 = p u 
15.6 = p 12 


3-4 

Total 

30 136 

166 

18.1 = P\ 

042 

Problems 

Controls 

26 41 

16 30 

67 ~ n 2 x 

46 “ ^22 

38.8 as p 2l 

34.8 = # 22 


Total 

42 71 

1 13 = «2 

37.2 = # 2 

0.19 

5 + 

Problems 

Controls 

27 22 

14 23 

49 = n 3l 

37 = n 32 

55.1 = £31 
37.8 = p 32 


.Total 

41 45 

86 ~ n 3 

47.7 = /i 3 

2.52 
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he had suffered any infant losses (e.g., stillbirths) prior to the birth of the 
hild. Since these loss rates increase with the birth order of the child, as 
able 9.14.1 shows, and since the two samples might not be comparable 
n the distributions of birth orders, the data were examined separately for 
hree birth-order classes. This is a common type of precaution. 

Each of the three 2x2 tables is first inspected separately. None of 
he x 2 values in a single table, shown at the right, approaches the 5% sig- 
lificance level. Note, however, that in all three tables the percentage of 
nothers with previous losses is higher in the problem children than in the 
;ontrols. We seek a test sensitive in detecting a population difference 
hat is consistently m one direction, although it may not show up clearly 
n the individual tables. 

A simple method is to compute x (the square root of x 2 ) m each table. 
3ive any Xi the same sign as the difference d x = p n - p l2 , and add the 
values. From table 9.14.1, 

Xi 4- X 2 4 X 3 == 4-0.650 4" 0.436 4* 1.587 = 4-2.673, 

sach Xi being 4- because all the differences are 4 - 

Under H 0 , any Xi is a standard normal deviate : hence, the sum of the 
3 x’s is a normal deviate with S.D. = <J3. The test criterion is X^y^/g, 
where g is the number of tables. In this case we have 2.673/^3 = 1.54. 
In the normal table, the two-tailed P value is just above 0.10. For this 
test the x’s should not be corrected for continuity. 

This test is satisfactory if ( 1 ) the n x do not vary from table to table by 
more than a ratio of 2 to 1, and (ii) the p t are m the range 20% to 75%. 
If the n t vary greatly, this test gives too much weight to the small tables, 
which have relatively poor power to reveal a falsity in the N.H. If the 
p’s in some tables are close to zero or 100%, while others are around 50%, 
the population differences <5, are likely to be related to the level of the p ir 
Suppose that we are comparing the proportions of cases in which body 
injury is suffered in auto accidents by seat-belt wearers and non-wearers. 
The accidents have been classified by severity of impact into mild, mod- 
erate, severe, extreme, giving four 2x2 tables. Under the mild impacts, 
both p n and p X2 may be small and 5 X also small, since injury rarely occurs 
with mild impact. Under extreme impact, p 41 and p 42 ma y both be close 
to 100%, making <5 4 also small. The large <5’s may occur in the two 
middle tables where the p’s are nearer 50%. 

In applications of this type, two mathematical models have been 
used to describe how 8 t may be expected to change as p l2 changes. One 
model supposes that the difference between the two populations is con- 
stant on a logit scale. The logit of a proportion p is log* ( p/q ). A constant 
difference on the logit scale means that log* (p lX /q l 1 ) - log e {p l 2 /q x 2 ) is 
constant as p l2 varies. The second model postulates that the difference is 
constant on a normal deviate (Z) scale. The value of Z corresponding to 
any proportion p is such that the area of a standard normal curve to the 
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left of Z is p . For instance, Z = 0 for p = 0.5, Z = 1.282 for p = 0.9, 
Z= -1.282 for/? = 0.1. 

To illustrate the meaning of a constant difference on these trans- 
formed scales, table 9.14.2 shows the size of difference on the original 
percentage scale that corresponds to a constant difference on (a) the logit 
scale (b) the normal deviate scale. The size of the difference was chosen 
to equal 20% at p 2 = 50%. Note that (i) the differences dimmish towards 
both ends of the p scale as in the seat belts example, (ii) the two transforma- 
tions do not differ greatly. 


TABLE 9.14 2 

Size of Difference S ~ p i —p 2 for a Range of Values of p 2 


Pl% 

B 




50 


90 

95 

I 99 

Constant logit 

2.6 

SO 

wm 

20.0 

20.0 


6.4 

35 

0.8 

Constant Z 

1.3 


m 

20 0 

20.0 

14 5 

5.5 

28 

06 


A test that gives appropriate weight to tables with large n t and is 
sensitive if differences are constant on a logit or a Z scale was developed 
by Cochran (9), If p t is the combined percentage in the /th table, and 

W, = n a n l2 l(n a + n i2 ) : d t = p n - p i2 . 


we compute 

^w l d i / y J'Lw i p t 4 i 


and refer to the normal table. For the data in table 9. 14.1 the computa- 
tions are as follows (with the d t in proportions to keep the numbers 


smaller). 

Birth 

Order 

H, 

4 

Mi 

pi 

PAi 

WiPAt 


39 3 

+0 040 

+ 1 57 

0 181 

0 1482 

5 824 

3~4 

27 3 

+ 0 040 

+ 1.09 

0 372 

0 2336 

6 377 

5+ ; 

21 1 

+ 0 173 

+ 3 65 

0.477 

0 2494 

5 262 

Sum 

i 


+ 631 



17 463 


The test criterion is 6.31/^(17.463) = 1.51. This agrees closely with 
the value 1 .54 found by the test, for which these tables are quite suitable 
There is another wav of computing this test. In the ilh table, let O t 
be the observed number of Problems losses and E t the expected numbet 
under H 0 For birth order 2 (table 9.14.1), O, = 20, E x = (30)( 102)/ 166 
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TABLE 9.14.3 

The Mantel-Haenszel Test for the Infant Loss Data in Table 9.14.1 


Birth Order 

Oi 

Ei 

n ii n i 2 c n c i 2 / n l 2 ( n i ~~ f) 

2 

20 

18.43 

5.858 

3-4 

26 

24.90 

6.426 

5 + 

27 

23.36 

5.321 

>um 

73 

66.69 

17.605 


Z = (73 — 

66.69 - 17.605 = 

1.38 


= 18.43. Then (O t — E{) = +1.57, which is the same as w 1 d 1 . This re- 
mit may be shown by algebra to hold in any 2x2 table. The criterion can 
:herefore be written 


11(0 i — E i )/\fZ.w i p i cl l 

This form of the test has been presented by Mantel and Haenszel 
'28, 29), with two refinements that are worthwhile when the n’ s are small. • 
First, the variance of or (O, — £ ( ) on H a is not f but the slightly 
larger quantity «u n i2&4/( n ii + n n — 1). If the margins of the 2 x 2 table 
are n a , n i2 , c n , and c i2 , this variance can be computed as 

n n n i2 c iiCa/ n i 2 ( n i ~ 1). («< = «u + n n), 

a form that is convenient in small tables. 

Secondly, a correction for continuity can be applied by subtracting 
1/2 from the absolute value of X (O t - £*), This version of the test is 
shown in table 9.14.3. The correction for continuity makes a noticeable 
difference even with samples of this size. 

The analysis of proportions is discussed further in sections 1 6.8-1 6. 12. 
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★ CHAPTER TEN 


O ne-way classifications. 
Analysis of variance 

10.1 — Extension from two samples to many. Statistical methods for 
two independent samples were presented in chapter 4, but the needs of the 
investigator are seldom confined to the comparison of two samples only. 
For attribute data, the extension to more than two samples was made in 
the preceding chapter. We are now ready to do the same for measure- 
ment data. 

First, recall the analysis used in the comparison of two samples. In 
the numerical example (section 4.9, p. 102), the comb weights of two 
samples of 11 chicks were compared, one sample having received sex 
hormone A, the other sex hormone C. Briefly, theprinci^ a' steps in the 
analysis were as follows : (i) the mean comb weights X x , X 2 we**e computed, 
(ii) the within-sample sum of squares of deviations Ex 2 , with 10 d.f. y 
was found for each sample, (iii) a pooled estimate s 2 of the within-sample 
variance was obtained by adding the two values of Ex 2 and dividing by 
tjie sum of the d.f 20, (iv) the standard error of the mean difference, 
X x — X 2 , was calculated as ^j(2s 2 ln\ where n = 1 1 is the size of each 
sample, (v) finally, a test of the null hypothesis fi x = fi 2 and confidence 
limits for \i x — ji 2 were given by the result that the quantity 

{X x -X 2 -Ux l - f x 2 )}l^(2s 2 ln) 

follows the /-distribution with 20 d.f. 

In the next section we apply this method to an experiment with four 
treatments, i.e., four independent samples. 

10.2 — An experiment with four samples. During cooking, doughnuts 
absorb fat in various amounts. Lowe (1) wished to learn if the amount 
absorbed depends on the type of fat used. For each of four fats, six 
batches of doughnuts were prepared, a batch consisting of 24 doughnuts. 
The data in table 10.2.1 are the grams of fat absorbed per batch, coded by 
deducting 100 grams to give simpler figures. Data of this kind are called 
a single or one-way classification, each fat representing one class. 

Before beginning the analysis, note that the totals for the four fats 
differ substantially, from 372 for fat 4 to 510 for fat 2. Indeed, there is a 
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In a second sample of 500 adults, handled in the same manner except 
that they were exposed at - 9°C., the numbers dead in groups of 100 were 
38, 30, 30, 40, 27. The y 2 value of 5.79 again verifies the technique, 
showing only sampling variation from the estimated mortality of 33%. 

The gratifying uniformity in the results leads one to place some con- 
fidence m the surprising finding that the death rates at -8 C. and -9°C. 
were markedly different. The total numbers dead in the two samples 
of 500 were 88 and 165. The result, x 2 = 31.37 with df = 1, P less than 
0,0002, provides convincing evidence that a rise in mortality with the 
lowering of temperature from — 8°C. to — 9 f C. is a characteristic of the 
population, not merely an accident of sampling. 

The ease of applying a test of experimental technique makes its use 
almost a routine procedure except in highly standardized processes. It is 
necessary merely to collect the data in several small groups, chosen with 
regard to the types of experimental variation thought likely to be present, 
instead of in one mass. The additional information may modify conclu- 
sions and subsequent procedures profoun fiy. 

In this example the sum of the three values of# 2 is 4,22 + 5.79 + 31.37 
= 41.38, with 9 d.f. If the initial y 2 is calculated from the 2 x 10 con- 
tingency table formed by the complete data, its value is also found to be 
41 .38, with 9 df. This agreement between the two values is a fluke, which 
does not hold generally in 2 x C tables. For 2 x C and RxC tables, a 
method of computing the component parts so that they add to the initial 
total x 2 ^ available (16). In these data this method amounts to using the 
same denominator pq = (0.253)(0.747), calculated from the total mortal- 
ity, in finding all x 2 values. Instead, for the 4 df y 2 at - 8°C. we used 
pq = (0.176)(0.824), appropriate to that part of the data, and at —9°C 
we used pq = (0.330)(0.670). The additive y 2 values give 3.24 + 6.77 
+ 31.37 = 41.38. However, when it has been shown that the mortality 
differs at — 8°C. and -9°C., use of a pooled p for the individual homo- 
geneity tests at -8°C. and — 9°C. is invalid. The non-additive method 
is recommended, except in a quick preliminary look at the data. 

9,10 — Ordered classifications. In the leprosy example of section 9.7, 
the classes (marked improvement, moderate improvement, slight im- 
provement, stationary, worse) are an example of an ordered classification . 
Such classifications are common in the study of human behavior and 
preferences, and more generally whenever different degrees of some phe- 
nomenon can be recognized. The problem of utilizing the knowledge 
that we possess about this ordering has attracted considerable attention 
in recent years. 

With a single classification of Poisson variables, the ordering might 
lead us to expect that if the null hypothesis p does not hold, an alterna- 

tive p t < p 2 < p 3 < should hold, where the subscripts represent the 
order. For instance, if working conditions in a factory have been classi- 
fied as Excellent, Good, Fair, we might expect that if the number of defec- 
tive articles per worker varies with working conditions, the order should 
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be p t < p 2 < /x 3 . Similarly, with ordered columns in a 2 x C contingency 
table, the alternative p 1 <p 1 <p 3 < might be expected, x 2 tests designed 
to detect this type of alternative have been developed by Bartholomew 
(17). The computations are quite simple. 

Another approach, used by numerous workers (9), (18), (19), is to 
attach a score to each class so that an ordered scale is created. To illus- 
trate from the leprosy example, we assigned scores of 3, 2, 1, respectively, 
to the Marked, Moderate, and Slight Improvement classes, 0 to the Sta- 
tionary class, and - 1 to the Worse class. These scores are based on the 
judgment that the five classes constructed by the expert represent equal 
gradations on a continuous scale. We considered giving a score of +4 
to the Marked Improvement class and —2 to the Worse class, since the 
expert seemed to examine a patient at greater length before assigning him 
to one of these extreme classes, but rejected this since our impression may 
have been erroneous. 

Having assigned the scores we may think of the leprosy data as 
consisting of two independent samples of 144 and 52 patients, respec- 
tively. (See table 9.10.1.) For each patient we have a discrete measure 
X of his change in health, where X takes only the values 3, 2, 1,0, — 1. 
We can estimate the average change in health for each sample, yvith its 
standard error, and can test the null hypothesis that this average change is 
the same in the two populations. For this test we use the ordinary two- 
sample f-test as applied to grouped data. The calculations appear in 
table 9.10.1. On the X scale the average change in health is -h 1.269 for 
patients with much infiltration and -b 0.8 19 for those with little infiltration. 
The difference, D, is 0.450, with standard error ±0.172 (194 <£/.), com- 
puted in the usual way. The value of t is 0.450/0.172 = 2.616, with 
P < 0.01. Contrary to the initial x 2 test, this test reveals a significantly 
greater amount of progress for the patients with much infiltration. 

The assignment of scores is appropriate when (i) the phenomenon 
in question is one that could be measured on a continuous scale if the 
instruments of measurement were good enough, and (ii) the ordered classi- 
fication can be regarded as a kind of grouping of this continuous scale, or 
as an attempt to approximate the continuous scale by a cruder scale that is 
the best we can do in the present state of knowledge. The process is 
similar to that which occurs in many surveys. The householder is shown 
five specific income classes and asked to indicate the class within which 
his income falls, without naming his actual income. Some householders 
name an incorrect class, just as an expert makes some mistakes in classi- 
fication when this is difficult 

The advantage in assigning scores is that the more flexible and power- 
ful methods of analysis that have been developed for continuous variables 
become available. One can begin to think of the sizes of the average 
differences between different groups in a study, and compare the dif- 
ference between groups A and B with that between groups E and i\ 
Regressions of the group means X on a further variable Z can be worked 
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TABLE 9.10.1 

Analysis of the Leprosy Data by Assigned Scores 
(Data with assigned scores) 


i 

Change in 
Health 

Infiltration 

Little 

Much 


1 

No. of patients 

X 

f 

f 

3 

11 

1 

2 

27 

15 

1 

42 

16 

0 

53 

13 

-1 

11 

1 

Total: 2/ 

144 

52 


(Computations) 

Little 

Much 

ZfX 

118 

66 

X = ZfX/Zf 

0.819 

1.269 

Zf X 2 

260 

140 

(ZJYflZf 

96.7 

83.8 

Zfx 2 

163.3 

56.2 

If. 

143 

51 

s z 

1.142 

1.102 

Pooled s 1 

1.131 



f 1 1 \ 

— + ~ - 0.0296 


J44 52/ 


s D 0.172 

D 1.269 - 0.819 


t SSI SB — — 

— 2.616 


s B 0.172 


d.f. = 194, 

P<0,01 



out. The relative variability of different groups can be examined by 
computing s for each group. 

This approach assumes that the standard methods of analysis of 
continuous variables, like the Mest, can be used with an X variable that 
is discrete and takes only a few values. As noted in section 5.8 on scales 
with limited values, the standard methods appear to work well enough for 
practical use._However, heterogeneity of variance and correlation be- 
tween s 2 and X are more frequently encountered because of the discrete 
scale. If most of the patients in a group show marked improvement, 
most of their X's will be 3, and s 1 will be small. Pooling of variances 
should not be undertaken without examining the individual s 2 . In the 
leprosy example the two s 2 were 1.142 and 1.102 (table 9.10.1), and this 
difficulty was not present. 
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The chief objection to the assignment of scores is that the method 
is more or less arbitrary. Two investigators may assign different scores 
to the same set of data. In our experience, however, moderate differences 
between two scoring systems seldom produce marked differences in the 
conclusions drawn from the analysis. In the leprosy example, the alterna- 
tive scores 4, 2, 1, 0, —2 give / = 2.549 as against t = 2.616 in the analysis 
in table 9.10.1. Some classifications present particular difficulty. If the 
degrees of injury to persons in accidents are recorded as slight, moderate, 
severe, disabling, and fatal, there seems no entirely satisfactory way of 
placing the last two classes on the same scale as the first three. 

Several alternative principles have been used to construct scores. In 
studies of different populations of school children, K. Pearson (20) as- 
sumed that the underlying continuous variate was normally distributed 
in a standard population of school children. If the classes are regarded as 
a grouping of this normal distribution, the class boundaries for the normal 
variate are easily found. The score assigned to a class is the mean of the 
normal variate within the class. A related approach due to Bross (2 1 ) also 
uses a standard population but does not assume normality. The score 
(i ridit ) given to a class is the relative frequency up to the midpoint of that 
class in the standard population. When the experimental treatments are 
different doses of a toxic or protective agent in biological assay, Ipsen (22) 
shows how to assign scores so that the resulting variate has a linear regres- 
sion on some chosen function of the dose, the ratio of the variance due 
to regression to the total variance being maximized. Fisher (23) assigns 
scores so as to maximize the F-ratio of treatments to experimental error 
as defined in section 10.5. The maximin method of Abelson and Tukey 
(24), maximizes the square of the correlation coefficient r 2 between the 
assigned scores and the set of true scores, consistent with the investigator’s 
knowledge about the ordering of the classes, that gives a minimum cor- 
relation with the assigned scores. This approach, like Bartholomew’s, 
avoids any arbitrary assumptions about the nature of the true scale. 

EXAMPLE 9 10 1 — In the leprosy data, venfy the value of t — 2.549 quoted for the 
scoring 4, 2, 1 , 0, — 2 

9.1 1 — Test for a linear trend in proportions. When interest is centered 
on the proportions p t in a 2 x C contingency table, there is another way 
of viewing the data. Table 9. 1 1 . 1 shows the leprosy data with the assigned 
scores X l9 but in this case the variable that we analyze is p h the proportion 
of patients with much infiltration. The contention now is that if these 
patients have fared better than patients with little infiltration, the values 
of Pi should increase as we move from the Worse class (X = — 1) towards 
the Marked Improvement class ( X = 3). 

If this is so, the regression coefficient of p t on X t should be a good test 
criterion. On the null hypothesis (no relation between p t and 2Q each p x 
is distributed about the same mean, estimated by p, with variance pq/rii. 
The regression coefficient b is calculated as usual, except that each p t 
must be weighted by the reciprocal of the sample size n t on which it is 



247 


TABLE 9 II 1 

Testing a Linear Regression of /?, on the Score (Leprosy Dai k) 


Degree of 
Infiltration 


Improvement 

Slight 

Stationary 

Worse 

Total 

Marked 

Moderate 

Little 

i 

11 

27 

42 

53 

11 

! 144 

Much (a,) 

7 / 

15 

16 

13 

1 

52 

Total (n t ) 

18 

42 

58 

66 

12 

196 (N) 

P, = o,ln, _ 

0 3889 

0.3571 

0.2759 

0.1970 

0 0833 

0 2653(1?) 

Score X t 

3 

2 

1 

0 

-1 



based. The numerator and denominator of b are computed as follows: 

Num. = - p)(X t - X) 

= 'ZnjiX * - ( , Ln i p l )(Ln t X l )/l t n i 
= 'La i X l - (Ha l )('Ln i X i )/N 
= 66 - (52)(184)/196 = 66 - 48.82 « 17.18 

Den. = E^Y, 2 - (I n.X.f/N 

= 400 - (184) 2 /196 - 400 - 172.8 = 227.2 
This gives b = 17.18/227.2 = 0.0756. Its standard error is 

s> = JiPq/Den.) = ^{(0.2653)(0.7347)/(227.2)} = 0.0293 
The normal deviate for testing the null hypothesis p — 0 is 
Z = b/s b = 0.0756/0.0293 = 2.580. P = 0.0098. 

Although it is not obvious at first sight, Yates (18) showed that this 
regression test is essentially the same as the /-test in section 9.10 of the 
difference between the mean scores in the Little and Much infiltration 
classes. In this example the regression test gave Z = 2.580 while the 
/-test gave / = 2.616 (194//./.) The difference in results arises because the 
two approaches u^e slightly different large-sample approximations to the 
exact distributions of X and / with these discrete data. 


EXAMPLE 9 111 Armitage (19) quotes the following data by Holmes and Williams 
for the relation in children between size of tonsils and the proportion of children who are 
carriers of streptococcus pvogenes in the nose 


Types of Children 

Y = Score Given to Size of Tonsils 

0 1 2 

Total Children 

Carriers (a t ) 

19 

29 

24 \ 

72 {A) 

Non-earners 

497 

560 

269 

1326 

Total («,) 

516 

589 

293 

1398 (N) 

Carrier-rate (/?,) 

0 0368 

0 0492 

0 0819 

0 051502 (p) 
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Calculate, (i) the normal deviate Z for testing the linear regression of the proportion of ear- 
ners on size of tonsils, (ii) the value of t for companng the difference between the mean size 
of tonsils m earners and non-earners. Ans. ( 1 ) Z = 2.681, (n) t ~ 2 686, with 1396 df 

EXAMPLE 9.1 1.2 — When the regression of p t on X l is used as a test criterion, it is of 
interest to examine whether the regression is linear Armitage (19) shows that this can be 
done by first computing x 2 = Znfpi ~ P) 2 iM ~ {Z&iPi - ^ 2 /N}/M* This * 2 , with (C - 1) 
d.f., measures the total variation among the C values of p t The x 2 for linear regression, with 
1 d.f , is found by squanng Z, since the square of a normal deviate has a x 2 distribution with 

I df The difference y (C _ 1} 2 - *i 2 , is a x 2 with (C - 2) d.f for testing the deviations of the 
p t from their linear regression on the X t . Compute this x 1 for the data m example 941-1. 
Ans. The total x 2 is 7.85 with 2 df, while Z 2 is 7.19 with 1 df. Thus the x 2 for the devia- 
tions is 0.66 with 1 df, m agreement with the hypothesis of linearity. 

9.12 — Heterogeneity y 2 in testing Mendelian ratios. It is often ad- 
visable to collect data in several small samples rather than in a single large 
one. An example is furnished by some experiments on chlorophyll in- 
heritance in maize (1), reported in table 9.12.1. The series consisted of 

I I samples of progenies of heterozygous green plants, self-fertilized, segre- 
gating into dominant green plants and recessive yellow plants. The hypo- 
thetical ratio is 3 green to 1 yellow. We shall study the proportion of 
yellow — theoretically 1/4. 

TABLE 9.12.1 

Number of Yellow Seedlings in 1 1 Samples of Maize 


No. in Sample 

No. Yellow 

Proportion Yellow 


a* 

Pi = 0,1*1 

122 

24 

0.1967 

149 

39 

0.2617 

86 

18 

0.2093 

55 

13 

0.2364 

71 

17 

0.2394 

179 

38 

0.2123 

150 

30 

0.2000 

36 

9 

0.2500 

91 

21 

0.2308 

53 

14 

0.2642 

111 

26 

0.2342 

N — 1103 A — 249 

Heterogeneity x 1 (10 df) 

X 2 = (Zap, ~ Ap)/pq = (0.5779)/(0.2258)(0.7742) = 3.31 
Pooled x 2 (1 df) 

Xc 2 = (|4 - Np\ - i) 2 /Npg 

= (|249 - 275 75 1 - £) 2 /(l 103)(0 25)(0 75) = 3 33 

p = 0.22575 


The data may fail to satisfy the simple Mendelian hypothesis in two 
ways. First, there may be real differences among the p l (proportion of 
yellow) in different samples . This finding points to some additional source 
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of variability that must be explained before the data can be used as a 
crucial test of the Mendelian ratio. Second, the p t may agree with one 
another (apart from sampling errors) but their overall proportion p may 
disagree with the Mendelian proportion p. The reason may be linkage or 
crossing-over, or differential robustness in the dominant and recessive 
plants. 

The first point is examined by applying to the p l the variance test for 
homogeneity of the binomial distribution (section 9.8). The value of x 2 
shown under table 9.12.1, is 3.31, with 10 d.f., P about 0.97. The test 
gives no reason to suspect real differences among the p v We therefore 
pool the samples and compare the overall ratio, p = 0.22575, with the 
hypothetical p = 0.25, by the x 2 test for a binomial proportion (section 
8.8). We find x 2 (corrected for continuity) = 3.33, P about 0.07. There 
is a hint of a deficiency of the recessive yellows. 

In showing the relation between these two tests, the following alge- 
braic identity is of interest : 

I n.(P, ~ P) 2 _ (I n,)(p -P ) 2 t I n£Pi - p) 2 {912J) 

P4 M Pd 

The quantity n i (p l — p) 2 !pq measures the discrepancy between the ob- 
served p i in the rth sample and the theoretical value p. If the null hypothe- 
sis is true, this quantity is distributed as x 2 with 1 df and the sum of these 
quantities over the C samples (left side of equation 9.12.1) is distributed 
as x 2 with C d.f. The first term on the right of (9. 1 2. 1 ) compares the pooled 
ratio p with p, and is distributed as x 2 with 1 df The second term on the 
right measures the deviations of the p { from their own pooled mean p» 
and is distributed as x 2 with (C — 1) df To sum up, the total x 2 on tie 
left, with C df. , splits into a x 2 with 1 df which compares the pooled 
sample p and the theoretical p, and a heterogeneity with (C — \)df*> 
which compares the p, among themselves. These x 2 distributions are of 
course followed only approximately unless the n { are large. 

In practice, this additive feature is less useful. Unless the pooled 
sample is large, a correction for continuity in the 1 df for the pooled x 2 
is advisable. This destroys the additivity. Secondly, the expression for 
the heterogeneity % 2 assumes that the theoretical ratio p applies in these 
data. If there is doubt on this point, the heterogeneity x 2 should be 
calculated, as in table 9.12.1, with pq in the denominator instead of pq. 
In this form the heterogeneity x 2 involves no assumption that p = p 
(apart from sampling errors). 

EXAMPLE 9 12 1— From a population expected to segregate 1 1, four samples wit! 
the lollowmg ratios were drawn, 47 33, 40:26, 30*42, 24 34 Note the discrepancies 
among the sample ratios Although the pooled x 2 does not indicate any unusual departure 
from the theoretical ratio, you will find a large heterogeneity x 2 equal to 9.01, P ~ 0.03, foi 
which some explanation should be sought 
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EXAMPLE 9. 12.2 —Fisher (25) applied x 2 tests to the experiments conducted by 
Mendel in 1863 to test different aspects of his theory, as follows: 


Experiment 

X 2 

df. 

Trifactonal 

8.94 

17 

Bifactorial 

2.81 

8 

Gametic ratios 

3.67 

15 

Repeated 2: 1 test 

0.13 

1 


Show that in random sampling the probability of obtaining a total x z lower than that ob- 
served is less than 0.005 (use the x 2 table). More accurately, the probability is less than 1 in 
2000. Thus, the agreement of the results with Mendel’s laws looks too good to be true. 
Fisher gives an interesting discussion of possible reasons. 

9.13 — The JR x C table. If each member of a sample is classified by 
one characteristic into R classes, and by a second characteristic into C 
classes, the data may be presented in a table with R rows and C columns. 
The entry in any of the RC cells is the number of members of the sample 
falling into that cell. Strand and Jessen (26) classified a random sample 
of farms in Audubon County, Iowa, into three classes (Owned, Rented, 
Mixed), according to the tenure status and into three classes (I, II, III), 
according to the level of the soil fertility (table 9.13.1). 


TABLE 9.13.1 

Numbers of Farms on Three Soil Fertility Groups in Audubon County, Iowa, 
Classified According to Tenure 


Soil 


Owned 

Rented 

Mixed 

Total 

I 

/ 

36 

67 

49 

152 


F 

36.75 

62.92 

52.33 



f-F 

-0.75 

4.08 

-3.33 


II 

f 

31 

60 

49 

- 140 


F 

33.85 

57.95 

48.20 



\ f~F 

-2 85 

2.05 

0.80 


III 

f 

58 

87 

80 

225 


F 

54.40 

93.13 

77.47 



f-F 

3.60 

-6.13 

2.53 


Total 


125 

214 

178 

517 


2 ^(/-F) 2 (— 0.75) 2 , , (2,53) 2 

y ss p =s -f* . . . + 

^ F 36.75 77.47 


1.54, d.f. = (i?~ 1)(C- 1) = 4 


Before drawing conclusions about the border totals for tenure status, 
this question is asked : Are the relative numbers of Owned, Rented, and 
Mixed farms in this county the same at the three levels of soil fertility? 
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This question might alternatively be phrased : Is the distribution of the 
soil fertility levels the same for Owned, Rented, and Mixed farms? (If a 
little refection does not make it clear that these two questions are equiva- 
lent, see example 9. 1 3. 1.) Sometimes the question is put more succinctly 
as: Is tenure status independent of fertility level? 

The x 2 test for the 2 x C table extends naturally to this situation. 
As before, 

X 2 = £</'- F) 2 /F, 

where f is the observed frequency in any cell and Fthe frequency expected 
if the null hypothesis of independence holds. 

As before, the expected frequency for any cell is computed from the 
border totals in the corresponding row and column : 

^ (row total)(column total) 

P — 

n 

row total , , 

= (column total) 


Examples: For the first row, 


row total 152 

- - — 


0.29400 


F x = (0.29400) (125) - 36.75 
F 2 = (0.29400) (2 14) = 62.92 
F 3 = (0.29400)(178) « 52.33 

This procedure makes the computation easy with a calculating machine. 
For verification, notice that (i) the sum of the F in any row or column is 
equal to the observed total, and consequently (ii) the surrrof the deviations 
in each row and in each column is zero. 

The facts just stated dictate the number of degrees of freedom One 
is free to put R — 1 expected frequencies in a column* but the remain- 
ing cell is then fixed as the column total minus the sum of the R - 1 values 
of F. Similarly, when we have inserted expected frequencies in this way 
in (C — 1) columns, the expected frequencies m the last column are fixed. 
Therefore, dj. = (R — 1)(C - 1). 

The calculation of y 2 is given in the table. Since P > 0.8, the null 
hypothesis is not rejected. If you do not need to examine the contribution 
of the individual cells of / 2 , up to half the time m computation can be 
saved by a shortcut devised by P. H. Leslie (27). This is especially useful 
if many tables are to be calculated. 

When x 2 is significant, the next step is to study the nature of the de- 
parture from independence in more detail. Examination of the cells in 
which the contribution to x 2 is greatest, taking note of the signs of the 
deviations (f — F), furnishes clues, but these are hard to interpret because 
the deviations in different cells are correlated. Computation of the per- 
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centage distribution of the row classification within each column, fol- 
lowed by a scrutiny of the changes from column to column, may be more 
informative. Further y 2 tests may help. For instance, if the percentage, 
distribution of the row classification appears the same in two columns, a 
X 2 test for these two columns may confirm this. The two columns can 1 
then be combined for comparison with other columns. Examples 9. 1 3.2, 
3, 4, 5 illustrate this approach. 

EXAMPLE 9.13.1 — Show that if the expected distribution of the column classification 
is the same in every row, then the expected distribution of the row classification is the same 
m every column. For the rth row, let F n , 1 P i2 , . . . F iC be the expected numbers in the respec- 
tive columns. Let F l2 = a 2 F lU F l3 = a 3 F n , ...F iC = a c F n . Then the numbers a 2 , a 3 , . . . a c 
must be the same in every row, since the expected distribution of the column classification 
is the same in every row. Now the expected row distribution in the first column is 
F 21 , . . . F Rl . In the second column it is F l2 = a 2 ^ii* ^22 = <* 2 -^ 21 * • • • F R2 = a 2 F R1 . Since 
a 2 is a constant multiplier, this is the same distribution as in the first column, and similarly 
for any other column. 

EXAMPLE 9.13.2 — In a study of the relation between blood type and disease, large 
samples of patients with peptic ulcer, patients with gastric cancer, and control persons free 
from these diseases were classified as to blood type (O, A , B, AB) In this example, the 
relatively small numbers of AB patients were omitted for simplicity. The observed numbers 
are as follows 


Blood Type 

Peptic Ulcer 

Gastric Cancer 

Controls 

Totals 

O 

983 

383 

2892 

4528 

A 

679 

416 

2625 

3720 

B 

134 

1 

84 

- .JL 

570 

788 

Totals 

1796 

883 

6087 

i 

8766 


Compute y 2 to *he null hypothesis that the distribution of blood types is the same for 
the three samples. Ans. y 2 = 40.54, 4 d.f. P very small. 


EXAMPLE 9.13.3 — To examine this question further, compute the percentage dis- 
tribution of blood types for each sample, as shown below. 


Blood Type 

Peptic Ulcer 

Gastric Cancer 

Controls 

O 


43.4 

47.5 

A 


47.1 

43.1 

B 


9.5 

9.4 

Totals 

100.0 

100.0 

100.0 


This suggests ( 1 ) there is little difference between the blood type distributions for gastric 
cancer patients and controls, (ii) peptic ulcer patients differ principally in having an excess of 
patients of type O. Going back to the frequencies in example 9.13.2, test the hypothesis 
that the blood type distribution is the same for gastric cancer patients and controls. Ans. 
X 2 « 5.64 (2 d.f.). P about 0.06. 

EXAMPLE 9.1 3.4 — Combine the gastric cancer and control samples. Test ( 1 ) whether 
the distribution of A and B types is the same in this combined sample as in the peptic ulcer 
sample (omit the O types). Ans. y 2 = 0.68 (I d.f.) P > 0.7. (ii) Test whether proportion 
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of 0 types versus j4 + S ^pes is the same for the combined sample as for the gastric cancer 
samples. Ans. x — 34.29 (1 df). P very small. To sum up. the high value of the ongmal 
4 d.f. x due primarily to an excess of O types among the peptic ulcer p atien ts. 


EXAMPLE 9.13.5— The preceding f tests may be summarized as follows: 


Comparison 

df 


0 , A, 3 types in gastric cancer (g) and controls (c) 

2 

5 .64 

A , B types in peptic ulcer and combined (g , c) 

1 


0, A and B types in peptic ulcer and combined (g, c) 

1 

34.29 

Total 

4 

40.61 


The total % ,40.61, is close to the ongmal x 2 , 40. 54, because we have broken down the ongmal 
4 d.f. into a series of independent operations that account for all 4 d.f. The difference be- 
tween 40.61 and 40.54, however, is not just a rounding error: the two quantities differ a little 
algebraically. 

9.14 — Sets of 2 x 2 tables. Sometimes the task is to combine the 
evidence from a number of 2 x 2 tables. The same two treatments or 
types of subject may have been compared in different studies, and it is 1 
desired to summarize the combined data. Alternatively, the results of a 
single investigation are often subclassified by the levels of a factor or 
variable that is thought to influence the results. The data in table 9.14.1, 
made available by Dr. Martha Rogers (in 9), are of this type. 

The data form part of a study of the possible relationship between 
complications of pregnancy of mothers and behavior problems in children. 
The comparison is between mothers of children in Baltimore schools who 
had been referred by their teachers as behavior problems and mothers of 
control children not so referred. For each mother it was recorded whether 


TABLE 9.14.1 

A Set of Three2 x 2 Tables: Numbers of Mothers With Previous Infant Losses 


Birth 

Order 

Type of 
Children 

No. of Mothers with : 

Total 

% Loss 

r s d df.) 

Losses No Losses 

2 

Problems 

Controls 

20 82 

10 54 

102 = n n 

19.6 — p lt 

15 6 = Pa 


3-4 

Total 

30 136 

166*= Wj 

18.1=0, 

0 42 

Problems 

Controls 

26 41 

16 30 

67 = n 21 ' 38.8 = 0 2 , I 
46 = n 22 34.8 = p 22 ! 


Total 

42 71 

113 = 37 2 ~ p 2 

0 19 

5 + 

Problems 

Controls 

27 22 

14 23 

49 = n 3i i 55 1=0 m 

37 = n >2 1 37.8 = 031 

Total 

41 45 

86 = n, 47.7 = 0 3 

2.52 
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she had suffered any infant losses (e.g., stillbirths) prior to the birth of the 
child. Since these loss rates increase with the birth order of the child, as 
table 9.14.1 shows, and since the two samples might not be comparable 
in the distributions of birth orders, the data were examined separately for 
three birth-order classes. This is a common type of precaution. 

Each of the three 2x2 tables is first inspected separately. None of 
the x 2 values in a single table, shown at the right, approaches the 5% sig- 
nificance level Note, however, that in all three tables the percentage of 
mothers with previous losses is higher in the problem children than in the 
controls. We seek a test sensitive in detecting a population difference 
that is consistently in one direction, although it may not show up clearly 
in the individual tables. 

A simple method is to compute y (the square root of y 2 ) in each table. 
Give any Xi the same sign as the difference d x = p n — p l2 , and add the 
Xi values. From table 9.14.1, 

Xi + Xi 4- Xi — +0.650 -f* 0.436 + 1.587 = +2.673, 

each Xi being + because all the differences are + 

Under H 0 , any x t is a standard normal deviate: hence, the sum of the 
3 %’s is a normal deviate with S.D. = yj3. The test criterion is 'ZxJy/g, 
where g is the number of tables. In this case we have 2.673/ 'yj 3 = 1.54. 
In the normal table, the two-tailed P value is just above 0.10. For this 
test the y’s should not be corrected for continuity. 

This test is satisfactory if (i) the n t do not vary from table to table by 
more than a ratio of 2 to 1, and (ii) the p x are in the range 20% to 75%. 
If the rii vary greatly, this test gives too much weight to the small tables, 
which have relatively poor power to reveal a falsity in the N.H. If the 
p’s in some tables are close to zero or 100%, while others are around 50%, 
the population differences <5 f are likely to be related to the level of the p iy 
Suppose that we are comparing the proportions of cases in which body 
injury is suffered in auto accidents by seat-belt wearers and non-wearers. 
The accidents have been classified by severity of impact into mild, mod- 
erate, severe, extreme, giving four 2x2 tables. Under the mild impacts, 
bothp u and p 12 may be small and<5j also small, since injury rarely occurs 
with mild impact. Under extreme impact, p 41 and p 42 may both be close 
to 100%, making S 4 also small. The large <5’s may occur in the two 
middle tables where the p’s are nearer 50%. 

In applications of this type, two mathematical models have been 
used to describe how <5, may be expected to change as p l2 changes. One 
model supposes that the difference between the two populations is con- 
stant on a logit scale. The logit of a proportion p is \og e (p/q). A constant 
difference on the logit scale means that log e Ipa/Qa) ~ log* (PaMn) * s 
constant as p l2 varies. The second model postulates that the difference is 
constant on a normal deviate (Z) scale. The value of Z corresponding to 
any proportion p is such that the area of a standard normal curve to the 
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left of Z is p. For instance, Z = 0 for p = 0.5, Z = 1.282 for /> = 0.9, 
Z= -1.282 for p = 0.1. 

To illustrate the meaning of a constant difference on these trans- 
formed scales, table 9.14.2 shows the size of difference on the original 
percentage scale that corresponds to a constant difference on (a) the logit 
scale (b) the normal deviate scale. The size of the difference was chosen 
to equal 20% at p 2 = 50%. Note that (i) the differences diminish towards 
both ends of the p scale as in the seat belts example, (ii) the two transforma- 
tions do not differ greatly. 


TABLE 9.14.2 

Size of Difference <5 = p t - p 2 for a Range of Values of p 2 


Pl% 

1 




50 


o 


^jj 

Constant logit 

2.6 

U 

12.4 

20.0 

20.0 ! 

15.3 

6.4 , 

r 

3.5 

1 0,8 

Constant Z 

1.3 

i 

6.0 

10.6 

20.0 

20,0 | 

14.5 

5.5 ! 

| 2.8 

0.6 


A test that gives appropriate weight to tables with large «, and is 
sensitive if differences are constant on a logit or a Z scale was developed 
by Cochran (9). If p t is the combined percentage in the ith table, and 


w f — n n n i2 /(n n + n i2 ) : d i — P n — P t2 , 


we compute 


’Lw i di/~jT,w l p t p i 


and refer to the normal table. For the data in table 9.14.1 the computa- 
tions are as follows (with the d, in proportions to keep the numbers 


smaller). 







Birth 

Order 

i H, 

d t 


A 

PA, 


2 

39.3 

+ 0.040 

+ J.57 

0 ISI 

0 1482 

5 824 

3-4 

27.3 

+ 0.040 

+ 1.09 

0 372 

0 2336 

6.377 

5 + 

21.1 

+ 0.173 

+ 3.65 

0 477 

0.2494 

5.262 

: 

Sum 

\ 


+ 6.31 



17.463 


The test criterion is 6.31 ^(17.463) = 1.51. This agrees closely with 
the value 1 .54 found by the T.% test, for which these tables are quite suitable. 

There is another way of computing this test. In the ith table, let O, 
be the observed number of Problems losses and E t the expected number 
under H 0 . For birth order 2 (table 9.14.1), O, — 20, E x - (30)(102)/166 
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TABLE 9.14.3 

The Mantel-Haenszel Test for the Infant Loss Data in Table 9.14. 1 


Birth. Order 

Ol 

Ei 

nannCnCii/n^n, - 1) 

2 

20 

18.43 

5.858 

3-4 

26 

24.90 

6.426 

5-f 

27 

23.36 

5.321 

Sum 

73 

66.69 

17.605 


Z = (73 - 

66.69 - i)/J 17.605 = 

1.38 


= 18.43. Then (O l — E v ) — +1.57, which is the same as w 1 d 1 . This re- 
sult may be shown by algebra to hold in any 2 x 2 table. The criterion can 
therefore be written 


. 2 ( 0 ,. - ' 

This form of the test has been presented by Mantel and Haenszel 
(28, 29), with two refinements that are worthwhile when the re’s are small. 
First, the variance of or (0, — E t ) on H a is not Wip^ but the slightly 
larger quantity nu w i 2 Mf/(”ii + n 'n — 1). If the margins of the 2 x 2 table 
are n n , n i2 , c n , and c i2 , this variance can be computed as 


WizCnOn/nfai - 1), ( n t = n a + « i2 ), 

a form that is convenient in small tables. 

Secondly, a correction for continuity can be applied by subtracting 
1/2 from the absolute value of 1.(0 1 — E t ). This version of the test is 
shown, in table 9.14.3. The correction for continuity makes a noticeable 
difference even with samples of this size. 

Theanalysis of proportions is discussed further in sections 16.8-16.12. 
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(3 n e-way classifications. 
Analysis of variance 


10.1 — Extension from two samples to many. Statistical methods for 
two independent samples were presented in chapter 4, but the needs of the 
investigator are seldom confined to the comparison of two samples only. 
For attribute data, the extension to more than two samples was made in 
the preceding chapter. We are now ready to do the same for measure- 
ment data. 

First, recall the analysis used in the comparison of two samples. In 
the numerical example (section 4.9, p. 102), the comb weights of two 
samples of 1 1 chicks were compared, one sample having received sex 
hormone A, the other sex hormone C. Briefly, theprinci, a' steps in the 
analysis were as follows : (i) the mean comb weights X 1 , X 2 wem computed, 
(ii) the within-sample sum of squares of deviations Ex 2 , with 10 d.f., 
was found for each sample, (iii) a pooled estimate s 2 of the within-sample 
variance was obtained by adding the two values of Ex 2 and dividing by 
the sum of the d.f., 20, (iv) the standard error of the mean difference, 

- X 2 , was calculated as f(2 s 2 /n), where n = 1 1 is the size of each 
sample, (v) finally, a test of the null hypothesis p l = fi 2 and confidence 
limits for n x — (i 2 were given by the result that the quantity 

{Xt —X 2 —(ji l — n 2 )}/y/(2s 2 /n) 

follows the /-distribution with 20 d.f. 

In the next section we apply this method to an experiment with four 
treatments, i.e., four independent samples. 

10.2 — An experiment with four samples. During cooking, doughnuts 
absorb fat in various amounts. Lowe (1) wished to learn if the amount 
absorbed depends on the type of fat used. For each of four fats, six 
batches of doughnuts were prepared, a batch consisting of 24 doughnuts. 
The data in table 10.2.1 are the grams of fat absorbed per batch, coded by 
deducting 100 grams to give simpler figures. Data of this kind are called 
a single or one-way classification, each fat representing one class. 

Before beginning the analysis, note that the totals for the four fats 
differ substantially, from 372 for fat 4 to 510 for fat 2. Indeed, there is a 
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TABLE 10.2.1 

Grams of Fat Absorbed Per Batch (Minus 100 Grams) 


Fat 

1 

2 

3 

4 

Total 


64 

78 

75 

55 



72 

91 

93 

66 



68 

97 

78 

49 



77 

82 

71 

64 



56 

85 

63 

70 



95 

77 

76 

68 


IX 

432 

510 

456 

372 

1,770 « G 

X 

72 

85 

76 

62 

295 

IX 2 

31,994 

43,652 

35,144 

23,402 

134,192 

(SJT) 2 /" 

31,104 

43,350 

34,656 

23,064 

132,174 

lx 2 

890 

302 

488 

338 

2,018 

d.f 

5 

5 

5 

5 

20 


Pooled s 2 = 2,018/20 = 100.9 

s„ = f(2s 2 /n) = 7(2)(100.9)/6 = 5.80 


clear separation between the individual results for fats 4 and 2, the highest 
value given by fat 4 being 70, while the lowest for fat 2 is 77. Every other 
pair of samples, however, shows some overlap. 

Proceeding as in the case of two samples, we calculate for each sample 
the mean X and the sum of squares of deviations Ex 2 , as shown under 
table 10.2.1. We then form a pooled estimate s 2 of the within-sample 
variance. Since each sample provides 5 d.f. for Ex 2 , the pooled s 2 = 100.9 
has 20 d.f. This pooling involves, of course, the assumption that the vari- 
ance between batches is the same for each fat. The standard error of the 
mean of any batch is ,/i 2 / 6 = 4.10 grams. 

Thus far, the only new problem is that there are four means to com- 
pare instead of two. The comparisons that are of interest are not neces- 
sarily confined to the differences X, — Xj between pairs of means : their 
exact nature will depend on the questions that the experiment is intended 
to answer. For instance, if fats 1 and 2 were animal fats and fats 3 and 4 
V£getable fats, we might be particularly interested in the difference 
(X t + X 2 )/2 - (X 3 + Xf)/2. A rule for making planned comparisons of 
this nature is outlined in section 10.7, with further discussion in sections 
10.8, 10.9. 

Before considering the comparison of means, we present an alterna- 
tive method of doing the preliminary calculations in this section. This 
method, of great utility and flexibility, is known as the analysis of variance 
and was developed by Fisher in the 1920’s. The analysis of variance per- 
forms two functions: 

1 . It is an elegant and slightly quicker way of computing the pooled 
s 2 . In a single classification this advantage in speed is minor, but in the 
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more complex classifications studied later, the analysis of variance is 
the only simple and reliable method of determining the appropriate 
pooled error variance s 2 . 

2. It provides a new test, the F-test. This is a single test of the null • 
hypothesis that the population means p u /i 2 , fa, fa> f° r the four fats are 
identical. This test is often useful in a preliminary inspection of the results 
and has many subsequent applications. 

EXAMPLE 10.2.1 — Here are some data selected for easy computation. Calculate the 
pooled s 2 and state how many d.f. it has. 


1 

Sample number 

2 3 

4 

11 

13 

21 

10 

4 

9 

18 

4 

6 

14 

15 

19 


Ans. s 1 = 21 5, with 8 d.f 


10.3 — The analysis of variance. In the doughnut example, suppose 
for a moment that there are no differences between the average amounts 
absorbed for the four fats. In this situation, all 24 observations are dis 
tributed about a common mean n with variance a 1 . 

The analysis of variance develops from the fact that we can make 
three different estimates of a 2 from the data in table 10.2. 1 . Since we are 
assuming that all 24 observations come from the same population, we 
can compute the total sum of squares of deviations for the 24 observations 
as 

64 2 + 72 2 + 68 2 + . . . + 70 2 + 68 2 - (1770) 2 /24 
= 134,192 - 130,538 = 3654 (10.3.1) 

This sum of squares has 23 d.f. The mean square, 3654/23 = 158.9. is 
the first estimate of a 2 . 

The second estimate is the pooled s 2 already obtained. Within each 
fat, we computed the sum of squares between batches (890, 302, etc.), 
each with 5 d.f. These sums of squares were added to give 

890 + 302 + 488 + 338 = 2018 (10.3.2) 

This quantity is called the sum of squares between batches within fats, or 
more concisely the sum of squares within fats. The sum of squares is 
divided by its d.f , 20, to give the second estimate, s 2 = 2,018/20 = 100.9. 

For the third estimate, consider the means for the four fats, 72, 85, 
76, and 62. These are also estimates of ft, but have variances cr 2 /6, since 
they are means of samples of 6. Their sum of squares of deviations is 


72 2 4- 85 2 + 76 2 + 62 2 - (295) 2 /4 = 272.75 
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with 3 d.f. The mean square, 272.75/3, is an estimate of c r 2 /6. Conse- 
quently, if we multiply by 6, we have the third estimate of a 2 . We shall 
accomplish this by multiplying the sum of squares by 6, giving 

6{72 2 + 85 2 + 76 2 + 62 2 - (295) 2 /4} = 1636 (10.3.3) 

the mean square being 1636/3 = 545.3. 

Since the total for any fat is six times the fat means, this sum of squares 
can be computed from the fat totals as 

432 2 + 510 2 + 456 2 + 372 2 (1770) 2 

6 24 

= 132,174 - 130,538 = 1636 (10.3.4) 

To verify this alternative form of calculation, note that 432 2 /6 = {6x 72) 2 /6 
= 6(72) 2 , while (1770) 2 /24 = (6x 295) 2 /24 * 6(295) 2 /4. This sum of 
squares is called the sum of squares between fats. 

Now list the d.f. and the sums of squares in (10.3.3), (10.3.2), and 
(10.3.1) as follows: 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Between fats 

3 

1,636 

Between batches within fats 

20 

2,018 

Total 

23 

3,654 


Notice a new and important result: the d.f. and the sums of squares for 
the two components (between fats and within fats) add to the correspond- 
ing total figures. These results hold in any single classification. The 
result for the d.f. is not hard to verify. With a classes and n observations 
per class, the d.f. are (a - 1) for Between fats, a(n — 1) for Within fats, 
and {an - 1 ) for the total. But 

(a - l ) + a{n - 1 ) = a — 1 + an — a — an - 1 

The result for the sums of squares follows from an algebraic identity 
(example 10.3.5). Because of this relation, the standard practice in the 
analysis of variance is to compute only the total sum of squares and the 
sum of squares Between fats. The sum of squares Within fats, leading to 
the pooled s 2 , is obtained by subtraction. 

Table 10.3.1 shows the usual analysis of variance table for the dough- 
nut data, with general computing instructions for a classes (fats) with n 
observations per class. The symbol T denotes a typical class total, while 
G - 2T= 1.I.X (summed over both rows and columns) is the grand total. 
The first step is to calculate the correction for the mean. 
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C = G 2 /an = (1770) 2 /24 = 130,538 

This is done because C occurs both in formula (10.3.1) for the total sum 
of squares and in formula (10.3.4) for the sum of squares between fats. 
The remaining steps should be clear from table 10.3.1. 


TABLE 10.3 1 

Formulas for Calculating the Analysis of Variance Table 
(Illustrated by the Doughnut Data) 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Between classes (fats) 

a — 1 = 3 

(EP» - C = 1,636 

545.3 

Within classes (fats) 

a(n — 1) = 20 

Subtract = 2,018 

100.9 

Total 

an — 1 = 23 

TZX 1 - C = 3,654 



Since the analysis of variance table is unfamiliar at first, the beginner 
should work a number of examples. The role of the mean square between 
fats, which is needed for the F-test, is explained in the next section. 

EXAMPLE 10.3. 1 — From the formulas in table 10.3. 1 , compute the analysis of variance 
for the simple data in example 10 2 1. Verify that you obtain 21 5 for the pooled s 2 , as 
found by the method of example 10.2.1. 


Source of Variation 

d-f. 

Sum of Squares 

Mean Square 

Between samples 

3 

186 

62.0 

Within samples 

8 

172 

21.5 

Total 

U 

358 

32.5 


EXAMPLE 10.3.2 — As part of a larger experiment (2), three levels of vitamin B 12 were 
compared, each level being fed to three different pigs. The average daily gams m weight of 
the pigs (up to 75 lbs. live weight) were as follows: 



Level of B u (mg /lb ration) 


5 

10 

20 

1 52 

1 63 

1 44 

1 56 

1 57 

1 52 

1 54 

1 54 

1.63 


Analyze the variance as Hollows 


Source oi Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Between levels 

2 

0 0042 

0 0021 

Within levels 

6 

L 

0 0232 

0 0039 

Total 

8 

0 0274 

0 0034 


Hint If >ou subtract 1 00 from each gam (or 1 .44 if you prefer it) you will save time Sub- 
traction of a common figure from every observation does not alter any of the results m the 
analysis of variance table. 
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EXAMPLE 10.3.3— In table 9.4.1 there were recorded the number of loopers (insect 
larvae) on 50 cabbage plants per plot after the application of five treatments to each of four 
plots. The numbers were * 



Treatment 



1 

2 3 

4 

5 

11 

6 8 

14 

7 

4 

4 6 

27 

4 

4 

3 4 

8 

9 

5 

6 11 

38 

14 

With counts like these, there is some question whether the assumptions required for the 
analysis of variance are valid. But for illustration, analyze the variance as fallows: 

Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Between treatments 

4 

359.30 

89.82 

Within treatments 

15 

311.25 

20.75 

Total 

19 

670.55 



EXAMPLE 10.3.4— The percentage of dean wool in seven bags was estimated by 
taking three batches at random from each bag. The percentages of clean wool m the batches 
were as follows: 


Bag Number 


1 

2 

3 

4 

5 

6 

7 

41.8 

33 0 

38 5 

43.7 

34,2 

32.6 

36,2 

38.9 

37 5 

35 9 

38 9 

38.6 

38.4 

33.4 

36.1 

33.1 

33.9 

36.3 

40.2 

34.8 

37.9 


Calculate the mean squares for bags (11 11) and batches within bags (8.22). 

EXAMPLE 10.3 5 — To prove the result that the sums of squares withm and between 
classes add to the total sum of squares, we use a notation that has become common for this 
type of data. Let X tj be the observation for the yth member of the zth class. X t . is the total 
of the zth class and X the grand total 

The sum of squares within the zth class is 


t V - X,*/n 

/» 1 


On adding this quantity over all classes to get the numerator of the pooled s 2 , we obtain, 
for the sum of squares withm classes 


a n 


1 u 

1=1 j ~ 1 


2 


I V/« 

1=1 


( 1 ) 


The sum of squares between classes is computed as 


X X, 2 fn - X 2 /an 

J= 1 


( 2 ) 
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The sum of (1) and (2) gives 

i i V - X 2 (an 
f-i J-i 

But this is the total sum of squares of deviations for the overall mean 

10.4 — Effect of differences between the population means. If the 
population means for the four fats are identical, we have seen that the 
mean ‘square between fats, 545.3, and the mean square within fats, 100.9, 
are both estimates of the population variance a 1 . What happens when the 
population means are different? In order to illustrate from a simple 
example in which you can easily verify the calculations, we drew (using 
a table of random normal deviates) six observations normally distributed 
with population mean jn = 5 and a = L These were arranged in three 
sets of two observations, to simulate an experiment with a = 3 treatments 
and n = 2 observations per treatment. 


TABLE 10 4.1 

A Simulated Experiment With Three Treatments and 
Two Observations Per Treatment 


Data 

Analysis of Variance 

Case I. 

Ml ** 02 “ /»J ** 5 


d.f 

S.S. 

M.S. 


Treatment 


Treatments 

2 

1.66 

0.83 

1 

2 

3 

Error 

3 

3.37 

1.12 

4.6 

3.3 

6.3 

Total 

5 

5.03 


5.2 

4.7 

4.2 




* 

9.8 

8.0 

10.5 





Case II. 

pi = 4 > Pi = 

r- 

II 

ro 

vT 


±f. 

S.S. 

M.S. 


Treatment 


Treatments 

2 

14.53 

7.26 

1 

2 

3 

Error 

3 

3.37 

1.12 

36 

3.3 

83 

Total 

5 

17.90 


42 

47 

6.2 





7.8 

80 

14 5 





Case III. 

\i x = 3, fi 2 = 

Ui 

TK 

II 

vo 


d.f. 

S.S 

M.S. 


Treatment 


Treatments 

2 

46.06 

23.03 

1 

2 

3 

Error 

3 

3 37 

1.12 

2.6 

3.3 

10.3 

Total 

5 

49.43 


3.2 

4.7 

8.2 





58 

8.0 

18.5 
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The data and the analysis of variance appear as Case I at the top of 
table 10.4.1. In the analysis of variance table, the Between classes sum of 
squares is labeled Treatments, and the Within classes sum of squares is 
labeled Error. This terminology is common in planned experiments. 
The mean squares, 0.83 for Treatments and 1.12 for Error, are both 
estimates of a 2 = 1 . 

In Case II we subtracted 1 from each observation for treatment 1 
and added 2 to each observation for treatment 3. This simulates an ex- 
periment with real differences in the effects of the treatments, the popula- 
tion means being fi l = 4, p 2 — 5, n 3 = 7. In the analysis of variance, 
notice that the Error sum of squares and mean square are unchanged. 
This should not be surprising, because the Error S.S. is the pooled Ex 2 
within treatments, and subtracting any constant from all the observations 
in a treatment has no effect on Ex 2 . The Treatments mean square has, 
however, increased from 0.83 in Case I to 7.26 in Case II. 

Case III represents an experiment with larger differences between 
treatments. Each original observation for treatment 1 was reduced by 2, 
and each observation for treatment 3 was increased by 4. The means are 
now Hi = 3, ii 2 = 5, = 9. As before, the Error mean square is un- 

changed. The Treatments mean square has increased to 23.03. Note 
that the samples for the three treatments have now moved apart, so that 
there is no overlap. 

When the means differ, it can be proved that the Treatments mean 
square is an unbiased estimate of 

<r 2 + n £ (ft - p) 2 /{a - 1) (10.4.1) 

i=i 

In Case II, with /q = 4, 5, 7, E(/i t - p) 2 is 4.67, while n and (a - 1) are 
both 2 and a 2 = 1, so that (10.4.1) becomes 1 + 4.67 = 5.67. Thus the 
Treatments mean square, 7.26, is an unbiased estimate of 5.67. If we drew 
a large number of samples and calculated the Treatments mean square for 
Case II for each sample, their average should be close to 5.67. 

In Case III, E^u, - p) 2 is 18.67, so that the Treatments mean square, 
23.03, is an estimate of the population value 19.67. 

10.5 — The variance ratio, F. These results suggest that the quantity, 
_ Treatments mean square __ Mean square between cla sses 
Error mean square Mean square within classes 

should be a good criterion for testing the null hypothesis that the popula- 
tion means are the same in all classes. The value of F should be around 
1 when the null hypothesis holds, and should become large when the /i, 
differ substantially. The distribution was first tabulated by Fisher in the 
form z - log e jF. In honor of Fisher, the criterion was named F by 
Snedecor (3). Fisher and Yates (4) designate F as the variance ratio. 

In Case I, Fis 0.83/1.12 = 0.74. In Case II, Fmcreases to 7.26/1.12 
= 6.48 and in Case III to 23.03/1.12 = 20.56. When you have learned 
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how to read the F-table, you will find that in Case II, F, which has 2 and 
3 degrees of freedom, is significant at the 10% level but not at the 5% level. 
In Case III, F is significant at the 5% level. 

To give some idea of the distribution of Fwhen the null hypothesis 
holds, a sampling experiment was conducted. Sets of 100 observations 
were drawn at random from the table of pig gains (table 3.2.1, p. 67), 
which simulates a normal population with fi = 30,a = 10. Each set was 
divided into a = 10 classes, each with n = 10 observations. The F ratio 
therefore has 9 d.f. in the numerator and 90 d.f. in the denominator. 


TABLE 10.5.1 

Distribution of Fin 100 Samples From Table 3.2.1 
(Degrees of freedom 9 and 90) 


Class Interval 

Frequency 

Class Interval 

Frequency 

0. -0.24 

7 

1.50-1.74 

5 

0.25-0.49 

16 

1.75-1.99 

2 

0.50-0.74 

16 

2.00-2.24 

4 

0.75-0.99 

26 

2.25-2.49 

2 

1.00-1.24 

11 

2.50-2.74 

2 

1.25-1.49 

8 

2.75-2.99 

1 


Table 10.5.1 displays the sampling distribution of 100 values of F. 
One notices first the skewness; a concentration of small values and a long 
tail of larger values. Next, observe that 65 of the F are less than 1 . If 
you remember that both terms of the ratio are estimates of a 2 , you may 
be surprised that 1 is not the median. The mean, calculated as with 
grouped data, is 0.96: the theoretical mean is slightly greater than 1. 
Finally, 5% of the values lie beyond 2.25 and 1% beyond 2.75, so that these 
points are estimates of the 5% and 1% levels of the theoretical distribution. 

Table A 14, Part I, contains the theoretical 5% and 1% points of Ffor 
convenient combinations of degrees of freedom. Across the top of the 
i table is found f degrees of freedom corresponding to the number of treat- 
ments (classes): A = a — 1. At the left is f 2 , the degrees of freedom for 
individuals, a{n — 1). Since the F-table is extensively used, table A 14, 
Part II, gives the 25%, 10%, 2.5%, and 0.5% levels. 

T o find the 5% and 1 % points for the sampling experiment, look in the 
column headed by /, =9 and down to the rows / 2 — 80 and 100. The re- 
quired points are 1 .98 and 2.62, halfway between those in the table. To be 
compared with these are the points experimentally obtained in table 1 0.5. 1 , 
2.25 and 2.75; not bad estimates from a sample of 100 experiments. In 
order to check the sampling distribution more exactly, we went back to 
the original calculations and found 8% of the sample F’s beyond the 5% 
point and 2% beyond the 1%. This gives some idea of the variation to be 
encountered in sampling. 

For the doughnut experiment, the hypothesis set up— that the batches 
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are random samples from populations with the same ji — may be judged by 
means of table A 14. From the analysis of variance in table 10.3.1, 

F= 545.3/100.9 = 5.40 

For / t = 3 and/ 2 = 20, the 1% point in the new table is 4.94. Thus from 
the distribution specified in the hypothesis there is less than one chance 
in 100 of drawing a sample having a larger value of F. Evidently the 
samples come from populations with different /x’s. The conclusion is that 
the fats have different capabilities for being absorbed by doughnuts. 


EXAMPLE 10.5 1 — Four tropical feedstuff's were each fed to a lot of 5 baby chicks (9) 
The gains in weight were: 


Lot 1 

55 

49 

42 

21 

52 

2 

61 

112 

30 

89 

63 

3 

42 

97 

81 

95 

92 

4 ; 

169 

137 

169 

85 

154 


Analyze the variance and test the equality of the fi. Ans. Mean squares : (i) lots, 8,745; 
(n) chicks within lots, 722 F « 12.1 Since the sample F is far beyond the tabular 1% point, 
there is little doubt that the feedstuff populations have different /T s. 

EXAMPLE 10.5.2— In the wool data of example 10.3.4, test the hypothesis that the 
bags are all from populations with a common mean. Ans. F = 1.35, F 0 05 = 2.85. There 
is not strong evidence against the hypothesis — the bags may all have the same percentage of 
clean wool 

EXAMPLE 10 5.3 — In the vitamin B 12 experiment of example 10.3.2, the mean gams 
toi the three levels differ less than is to be expected from the mean square within levels. 
Although there is no reason for computing it, the value of Fis 0.54 There is, of course, no 
evidence of differences among the 

EXAMPLE 10.5.4 — In example 10.3.3, test the hypothesis that the treatments have no 
effect on the number of loopers Ans. F = 4.33. What do you conclude? 


10.6 — Analysis of variance with only two classes. When there are only 
two classes, the F-test is equivalent to the Mest which we used in chapter 
4 to compare the two means. With two classes, the relation F = t 2 holds. 
We shall verify this by computing the analysis of variance for the numeri- 
cal example in table 4.9.1, p, 103, The pooled s 2 = 16,220/20 = 811, 
has already been computed m table 4.9.1. To complete the analysis of 
variance, compute the Between samples sum of squares. Since the sample 
totals were 1067 and 616, with « =11, the sum of squares is, 


(1067) 2 + (616) 2 (1683) 2 


- 9245.5 


(10.6.1) 


11 22 

With only two samples, this sum of squares is obtained more quickly as 
(TX l - XX 2 ) 2 (1067 - 616) 2 


(2X11) 


= 9245 5 


( 10 . 6 . 2 ) 
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TABLE 10.6.1 

Analysis of Variance of Chick Experiment, Table 4.9.1 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Between samples 

1 

9,245.5 

9,245.5 

Within samples 

20 

16,220.0 

811.0 

F 

* 9,245.5/811.0 = 11.40 

yjF = 3.38 = t 



Table 10.6.1 shows the analysis of variance and the value of F, 11.40. 

Note that JF— 3.38, the value of t found in table 4.9.1. Further, 
in the F table with /, = 1, the significance levels are the squares of those 
in the t table for the same f 2 . While it is a matter of choice which one is 
used, the fact that we arenearly always interested in the size and direction 
of the difference - X 2 ) favors the t- test. 

EXAMPLE i 0.6.1 — Hansberry and Richardson (5) gave the percentages of wormy 
apples on two groups of 12 trees each. Group A, sprayed with lead arsenate, had 19, 26, 
22, 1 \ 26, 25, 38, 40, 36, 12, 16, and 8% of apples wormy. Those of group B , sprayed with 
calcium arsenate and buffer materials, had 36, 42, 20, 43, 47, 49, 59, 37, 28, 49, 31, and 39% 
wormy. Compute the mean square Within samples, 11 1 .41 , with 22 d.f. ; and that Between 
samples, 1650.04, with 1 d.f. Then, 

F — 1650.04/111.41 » 14.8 

Next, test the significance of the difference between the sample means as m section 4.9. The 

value of t is 3.85 = ^14.8. 

EXAMPLE 10.6.2— For f x - l,/ 2 = 20, verify that the 5% and 1% significance levels 
of Fare the squares of those of t with 20 d.f. 

EXAMPLE 10.6.3 — Prove that the methods used m equations (10.6.1) and (10.6.2) m 
the text for finding the Between samples sum of squares, 9245.5, are equivalent. 

EXAMPLE 10.6.4— From equation (10.6.2) it follows that F_= t 2 . For F 
* (E*i - ZX 2 ) 2 !2ns 2 , while * = % 2 )/<J(2s 2 /n). Since X, = XXJn, X 2 = ZJT a /n, we 

have t » (EX', - XX 2 )/f(2 ns 2 ) - yjF. 

10,7 — Comparisons among class means. The analysis of variance is 
only the first step in studying the results. The next step is to examine the 
class means and the sizes of differences among them. 

Often, particularly in controlled experiments, the investigator plans 
the experiment in order to estimate a limited number of specific quantities. 
For instance, m part of an experiment on sugar beet, the three treatments 
(classes) were: (i) mineral fertilizers (PK) applied in April one week before 
sowing, (ii) PK applied in December before winter ploughing, (iii) no 
minerals. The mean yields of sugar in cwt. per acre were as follows : 

PK m April, X x = 68.8, PK in December, X 2 = 66.8, No PK, X 3 = 62.4 

The objective is to estimate two quantities: 

Average effect of PK : ){X x + X 2 ) - J 3 = 67.8 - 62.4 = 5.4 cwt. 

April minus December application: X x - X 2 = 2.0 cwt. 
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A rule for finding standard errors and confidence limits of estimates 
of this type will now be given. Both estimates are linear combinations of 
the means, each mean being multiplied by a number. In the first estimate, 
the numbers are_l/2, 1/2, - 1. In the second, they are 1, - 1, 0, where we 
put 0 because X 3 does not appear. Further, in each estimate, the sum 
of the numbers is zero. Thus, 

(i) + (i) + (-l)«0 : (l) + (-l) + (0) = 0 

Definition. Any linear combination, 

L — k x X x + A 2 .7 2 + • . . + AjjX'n, 

where the A’s are fixed numbers, is called a comparison of the treatment 
means if Ex,- = 0. The comparison may include all a treatment means, 
( k — a), or only some of the means (k < a). 

Rule 10.7.1. The standard error of L is y J'LX 2 {ojJri), and the esti- 
mated standard error is yfzi \s/yjn), with degrees of freedom equal to 
those in s, where n is the number of observations in each mean X f . 

In the example the value of s/yjn was 1.37 with 24 d.f. Hence, for 
the average effect of PK, with A, = 1/2, A 2 — 1/2, A 3 = — 1, the estimated 
standard error is 

JW + W + ( — l)^(l-37) = 70(1.37) = 1.68, 

with 24 d.f. The value of t for testing the average effect of PK is 
t — 5.4/1.68 = 3.2, significant at the 1% level. Confidence limits (95° 0 ) 
are 5.4 ± (2.06) (1.68), or 1.9 and 8.9 cwt. per acre. 

For the difference between the April and December applications, 
with At = 1, A 2 = —1, the estimated standard error is N / 2 (1.37) = 1.94. 
The difference is not significant at the 5% level, the confidence limits 
being 2.0 ± (2.06)(1.94), or -2.0 and +6.0. 

In view of the importance of Rule 10.7.1, we shall sketch the proof of 
this result. Since the k, are fixed numbers, the population mean of L is 

Pl = + A 2 /x 2 + . . . + k k p k 

where //, is the population mean of X,. Hence, 

P — Pi = Aj(A, — p x ) + A 2 (A' 2 — p 2 ) + ... + k k (% k -* p k ) 

By definition, the variance of L is the average value of (L - p L ) 2 taken over 
the population Now 

k k k 

(L - p L ) 2 = X A, 2 (X, - p-fi + 2 Y I V-A ~ M ~ Pf 

1=1 ‘ 1 J> l 

The average value of (X i - fi t ) 2 over^the population is of course the 
variance of X r The average value_of (X t - n t ) (Xj - jXj) is the quantity 
which we called the covariance of X t and Xj (section 7.4, p. 181). This 
gives the general formula, 



270 Chapter JO; One-Way Classifications* Analysis of Variance 

k k k 

V(L) « l XMXi) + 2 t I ^jeoHXiXj) (10.7.1) 

i= 1 i=l J>i 

When the X i are the means of independent samples of size n, F(X ; ) = <y 2 jn, 
and Cov. (X t Xj) - 0, giving 

V(L) = (I lx, V/« 
in agreement with Rule 10.7.1. 

When reporting the results of a series of comparisons, it is important 
to give the sizes of the differences, with accompanying standard errors or 
confidence limits. For any comparison of broad interest, it is likely that 
several experiments will be done, often by workers in different places. 
The best information on this comparison is a combined summary of the 
’ results of these experiments. Tn order to make this, an investigator needs 
to know the sizes of the individual results and their standard errors. If 
he is told merely that “the difference was not significant” or “the differ- 
ence was significant at the 1% level,” he cannot begin to summarize effec- 
tively. 

For the example, a report might read as follows. “Application of 
mineral fertilizers produced a significant average increase in sugar of 5.4 
cwt. per acre (+1.68). The yield of the April application exceeded that 
of the December application by 2.0 cwt. (±1.94), but this difference was 
not significant.” 

Comments : (i) Unless this is already clear, the report should state the 
the amounts of P and K that were applied ; (ii ) there is much to be said for 
presenting, in addition, a table of the treatment (class) means, with their 
standard error, ±1.37. This allows the reader to judge whether the gen- 
eral level of yield was unusual in any way, and to make other comparisons 
that interest him. 

Further examples of planned comparisons appear in the next two 
chapters. Common cases are the comparison of a “no minerals” treat- 
ment with minerals applied in four different ways (section 1 1.3), the com- 
parison of different levels of the same ingredient, usually at equal intervals, 
where the purpose, is to fit a curve that describes the relation between yield 
and the amount of the ingredient (section 1 1 .8), and factorial experimenta- 
tion, which forms the subject of chapter 12. i 

Incidentally, when several different comparisons are being made, one 
or two of the comparisons may show significant effects even if the initial 
F-test shows non-significance. 

The rule that a comparison L is declared significant at the 5% level 
if L/s l exceeds t 0 0J is recommended for any comparisons that the experi- 
ment was designed to make. Sometimes, in examining the treatment 
means, we notice a combination which we did not intend to test but which 
seems unexpectedly large. If we construct the corresponding L , use of the 
f-test for testing L/s l is invalid, since we selected L for testing solely be- 
cause it looked large. 
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Scheffe (1 1) has given a general method that provides a conservative 
te st in this s ituation. Declare Ljs h significant only if it exceeds 
sj(a - l)iVo 5 , where F o os is the 5% level of F for degrees of freedom 
fi~ (a ~ IX /2 = a(n - 1). In more complex experiments, f 2 is the num- 
ber of error d.f. provided by the experiment. Scheffe’s test agrees with 
the f-test when a — 2, and requires a substantially higher value of Ljs h 
for significance when a >2. It allows us to test any number of compari- 
sons, picked out by inspection, with the protection that the probability 
of finding any erroneous significant result is at most 0.05. 

EXAMPLE 10.7.1— In an experiment in which mangolds were grown on acid soil (6), 
part of the treatments were: (i) chalk, (ii) lime, both applied at the rate of 21 cwt, calcium 
oxide (CaO) per acre, and (iii) no liming. For good reasons, there were twice as many “no 
lime” plots as plots with chalk or with lime. Consequently, the comparisons of interest may 
be expressed algebraically as 

Effect of CaO: \(X x + X 2 ) - 4* X A ) 

where X A represent the two “no lime” classes. 

Chalk minus lime: X x - X 2 . 

The mean yields were (tons per acre): chalk, 14.82; lime, 13.42; no lime, 9.74. The 
s.e. of any X t was + 2,06 tons, with 25 d.f Calculate the two comparisons and their standard 
errors, and write a report on the results. Ans. Effect of CaO, 4.38 ± 2.06 tons. Chalk 
minus lime, 1.40 ± 1 .98 tons. 

EXAMPLE 10.7.2 — An experiment on sugar beet (7) compared times and methods of 
applying mixed artificial fertilizers ( NPK ). The mean yields of sugar (cwt. per acre) were as 
follows: 



No 

Artificials 

Jan. (Ploughed) 

Artificials applied in : 
Jan. (Broadcast) 

Apr. (Broadcast) 


38.7 

48.7 

48.8 

45.0 




*3 

r 4 


Their s.e. was ± 1 .22, with 14 d.f. Calculate 95% confidence limits for the following com- 
parisons : 

Average effect of artificials j(X 2 + X 3 + X 4 ) - X x 
January minus April application: \(X 2 + %$) - 
Broadcast minus Ploughed in Jan. : - X 2 

Ans.: (i) (5.8, 11.8); (ii) (0.6, 7.0); (iii) (-3.6, +3.8) cwt. per acre. 

EXAMPLE 10.7.3— One can encounter linear combinations of the means that are not 
comparisons as we have defined them, but this seems to be rare. For instance, in early 
experiments on vitamin B x 2 , rats were fed on a B 12 -deficient diet until they ceased to gam m 
weight. If we then compared a single and a double supplement of B 12 , measuring the subse- 
quent gains m weight produced, it might be reasonable to calculate (X 2 - 2 % x ), which should 
be zero if the gain in weight is proportional to the amount ofB 12 . Here k x + X 2 ^ 0. The 
formula for the standard error still holds The s.e . is yjiaifh in this example. 

10,8 — Inspection of all differences between pairs of means. Often, 
the investigator has no specific comparisons, chosen in advance, that 
he proposes to make. Instead, he looks at all the means to see which 
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differences among them appear to be real. The most frequent example 
is when the treatments are qualitatively similar, as in tests on working 
gloves made by different manufacturers. 

Taking the doughnut data from table 10.2.1 as an illustration, the 
means for the four fats (arranged in increasing order) are as follows: 

TABLE 10.8.1 


Fat 

4 

1 

3 

2 

LSD 

D 

Mean grams absorbed 

62 

72 

76 

85 

12.1 

16.2 


The standard error of the difference between two means, v /(2j 2 /«), is 
± 5.80, with 20 d.f. (table 10.2.1). The 5% value of t with 20 d.f. is 2.086. 
Hence, the difference between a specific pair of means is significant at the 
5% level if it exceeds (2.086)(5.8) = 12.1 . 

The highest mean, 85 for fat 2, is significantly greater than the means 
72 for fat 1 and 62 for fat 4. The mean 76 for fat 3 is significantly greater 
than the mean 62 for fat 4. None of the other three differences between 
pairs reaches 12.1. The quantity 12.1 which serves as a criterion is called 
the Least Significant Difference (LSD). Similarly, 95% confidence limits 
for the population difference between any pair of means are given by 
adding ±12.1 to the observed difference. 

Objections to indiscriminate use of the LSD in significance tests 
have been raised for many years. Suppose that all the population means 
Hi are equal, so that there are no real differences. With five types of gloves, 
for instance, there are ten possible comparisons between pairs of means. 
The probability that at least one of the ten exceeds the LSD is bound to 
be greater than 0.05 : it can be shown to be about 0.29. With ten means 
(45 comparisons among pairs) the probability of finding at least one sig- 
nificant difference is about 0.63 and with 15 means it is around 0.83. 

When the jq are all equal, the LSD method still has the basic property 
of a test of significance, namely that about 5% of the tested differences 
will erroneously be declared significant. The trouble is that when many 
differences are tested, some that appear significant are almost certain to be 
found. If these are the ones that are reported and attract attention, the 
test procedure loses its valuable property of protecting the investigator 
against making erroneous claims. 

Commenting on this issue. Fisher (8) wrote: “When the z test (i.e., 
the F-test) does not demonstrate significance, much caution should be 
used before claiming significance for special comparisons.” In line with 
this remark, investigators are sometimes advised to use the LSD method 
only if Fis significant. 

Among other proposed methods, perhaps the best known is one 
which replaces the LSD by_a criterion based on the tables of the Student- 
tzed Range, Q - (X m3x - X mm )/s%. Table A 15 gives the upper 5% levels 
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of Q, i.e., the value exceeded in 5% of experiments. This value depends 
on the number of means, a , and the number / of d.f. in sj. Having read 
Qo os from ta ble A 1 5, we compute the difference D between two means 
that is required for 5% significance as Q 0 0i s x . 

For the doughnuts, a = 4, /= 20 , we find Q 0 os = 3.%. Hence 
D - Qo os sjc = (3.96)(4.1) = 16.2. Looking back at table 10.8.1, only 
the difference between fats 2 and 4 is significant with this criterion. When 
there are only two means, the Q method becomes identical with the LSD 
method. Otherwise Q requires a larger difference for significance than 
the LSD. 

The Q method has the property that if we test some or all of the 
differences between pairs of means, the probability that no erroneous 
claim of significance will be made is > 0.95. Similarly, the probability that 
all the confidence intervals (X, - Xj) ± D will correctly include the differ- 
ence n, — is 0.95. The price paid for this increased protection is, of 
course, that fewer differences /i, — /i, that are real will be detected and 
that confidence intervals are wider. 

EXAMPLE 10.8.1 — In Case III of the constructed example m table 10,4.1, with /i, =® 3. 
Hi = 5, n 3 = 9, the observed means are = 2.9, X 2 = 4,0, X 3 - 9.25, with s.e » f(s 2 /n) 
= 0 75 (3 d.f.). Test the three differences by (l) the LSD test, (u) the Q test Construct a 
confidence interval for each difference by each method, (ui) Do all the confidence intervals 
mclud e(ji, -ty)’ Xus. (i) LSD = 3.37. X 3 significantly greater than X 2 and X, (n) Re- 
quired difference = 4.43. Same significant differences (in) Yes. 

EXAMPLE 10.8.2 — In example 10.5.1, the mean gams in weight of baby chicks under 
four feeding treatments were X 3 = 43.8, X 2 = 71.0, X 3 = 81.4, X 4 = 142.8 while ^/'{j 3 /w) 
= 12.0 with 16 d.f. Compare the means by the LSD and the Q methods Ans Both methods 
show that differs significantly from any other mean. The LSD method gives X, sig- 
nificantly greater than X t 


Hartley (30) showed that a sequential variant of the Q method, 
originally due to Newman (10) and Keuls (31), gives the same type of 
protection and is more powerful; that is, the variant will detect real dif- 
ferences more frequently than the original Q method. 

Arrange the means in ascending order. For the doughnut fats, these 
means are as follows : 


Fat 

4 

1 

3 

2 

S.D 


62 

72 

76 

85 

■£4.10 (20 d.f) 


As before, first test the extreme difference, fat 2 - fat 4 = 23, against 
D = 16.2. Since the difference exceeds D, proceed to test fat 2 — fat 
1 = 13 and fat 3 - fat 4 = 14 against the D value for a - 3, because these 
comparisons are differences between the highest and lowest of a group of 
three means. Fora = 3,/= 20, Q is 3.58, giving D = (3.58)(4.10) = 14.7. 
Both the differences, 13 and 14, fall short of D. Consequently we stop, 
the difference between fats 2 and 4 is the only significant difference in the 
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experiment. If fat 3 — fat 4 had been, say, 17, we would have declared 
this difference significant and next tested fat 3 — fat 1 and fat 1 - fat 4 
against the D value for <2 = 2. 

Whenever the highest and lowest of a group of means are found not 
significantly different in this method, we declare that none of the members 
of this group is distinguishable. This rule avoids logical contradictions in 
the conclusions. The method is called sequential because the testing fol- 
lows a prescribed order or sequence. 

Since protection against false claims of significance is obtained by 
decreasing the ability to detect real differences, a realistic choice among 
these methods requires a judgment about the relative seriousness of the 
two kinds of mistake. Duncan (32) has examined the type of policy that 
emerges if the investigator assigns relative costs to (i) declaring a signifi- 
cant result when the true difference is zero, (ii) declaring non-significance 
when there is a true difference, (iii) declaring a significant result in the 
wrong direction. His policy is designed to minimize the average cost of 
mistakes in such verdicts of significance or non-significance. These costs 
are not necessarily monetary but might be in terms of utility or equity. 
His optimum policy resembles an LSD rule with two notable differences. 
In its simplest form, which applies when the number of treatments exceeds 
15 and d.j. in s exceed 30, a differenc e between two means is declared 
significant if it exceeds s D ? a jF/{F- 1). The quantity t x (not Student’s t) 
depends on the relative costs assigned to wrong verdicts of significance or 
non-significance. If Fis large, indicating that there are subs tantial diff er- 
ences among the population means of the treatments, J~F/(F — \) is 
nearly 1 . The rule then resembles a simple LSD rule, but with the size 
of the LSD determined by the relative costs. As F approaches 1, suggest- 
ing that differences among treatment means are in general small, the 
difference required for significance becomes steadily larger, leading to 
greater caution in declaring differences significant. The F-value given by 
the experiment enters into the rule because F provides information as to 
whether real differences among treatment means are likely to be large or 
small. In Duncan’s method, the investigator may also build into the rule 
his a priori judgment on this point. 

In a large sampling experiment with four treatments, Balaam (33) 
compared (i) the LSD method, (ii) the revised LSD* method in which no 
significant differences are declared unless Fis significant, (iii) the Newman- 
Keuls method (as well as other methods). Various sets of values were 
assigned to the population means p v including a set in which all p t were 
equal. For each pair of means, a test procedure received a score of 4- 1 
if it ranked them correctly, a score 0 if it declared a significant difference 
when Pi = pj or found no difference when p t # p p and a score — 1 if it 
ranked the means in the wrong order. These scores were added over the 
six pairs of means. 

When all p t were equal, the average scores were: LSD , 5.76: Re- 
vised LSD , 5.91 ; NK, 5.94. With three means equal, so that three of the 
six differences between pairs were equal and three unequal, average scores 
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were: LSD , 3.80; Revised LSD, 3.57; NK, 3.51. With more than three 
inequalities between pairs, average scores were: LSD , 1.92; Revised 
LSD , 1 .13; NK, 1 .63. To sum up for this section, no method is uniformly 
best. In critical situations, try to judge the relative costs of the two kinds 
of mistakes and be guided by these costs. For routine purposes, thought- 
ful use of either the LSD or the Newman-Keuls method should be satis- 
factory. Remember also Scheffe’s test (p. 271) for a comparison that is 
picked out just because it looks large. 

10.9 — Shortcut computation using ranges. An easy method of testing 
all comparisons among means is based on the ranges of the samples (13). 
In the doughnut experiment, table 1 0.2. 1 , the four ranges are 39, 20, 30, 21 ; 
the sum is 110. This sum of ranges is multiplied by a factor taken from 
table 10.9.1 . In the column for a = 4 and the row for n = 6, take the fac- 
tor 0.95. Then 

jy _ (Factor)(Sum of Ranges) __ (0.95)(110) 

n 6 ~ 7 ‘ 

D' is used like the D in the g-test of the foregoing section. Comparing it 
with the six differences among treatments, we conclude, as before, that 
only the largest difference, 23, is significant. 


TABLE 10.9.1 

Critical Factors for Allowances, 5% Risk* 


Sample 

Size, 

/? 




Number of Samples, a 




2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

3.43 

2.35 

1.74 

1.39 

1.15 

0.99 

0.87 

0.77 

0.70 

3 

1.90 

1 44 

1.14 

.94 

.80 

.70 

.62 

.56 

.51 

4 

1.62 

1.25 

1.01 

.84 

.72 

.63 

.57 

.51 

.47 

5 

1.53 

1.19 

.96 

.81 

.70 

.61 

.55 

.50 

.45 

6 

1.50 

1.17 

.95 

.80 

.69 

.61 

.55 

.49 

.45 

7 

1.49 

1.17 

.95 

.80 

.69 

.61 

.55 

,50 

.45 

8 

1.49 

1.18 

.96 

.81 

.70 

.62 

.55 

.50 

.46 

9 

1.50 

1.19 

.97 

.82 

.71 

.62 

.56 

.51 

.47 

10 

1.52 

1.20 

.98 

.83 

.72 

.63 

.57 

.52 

.47 


* Extracted from a more extensive table by Kurtz, Link, Tukey, and Wallace (13). 


EXAMPLE 10.9.1 — Using the shortcut method, examine all differences m the chick 
experiment of example 10 5.1 (p. 267) Ans. O' = 49. Same conclusions as for the Q 
method m example 10.8.2. 

10.10 — Model I. Fixed treatment effects. It is time to make a more 
formal statement about the assumptions underlying the analysis of vari- 
ance for single classifications. A notation common in statistical papers 
is to use the subscript i to denote the class, where i takes on the values 
1, 2, ... a. The subscript j designates the members of a class, / going 
from 1 to n. 




276 Chapter 10: One-Way Classifications. Analysis of Variance 

Within class i, the observations X l} are assumed normally distributed 
about a mean p { with variance a 2 . The mean p, may vary from class to 
class, but a 2 is assumed the same in all classes. We denote the mean of 
the a values of p t by p, and write p- t = p + a,. It follows, of course, that 
Dot; = 0. Mathematically, the model may be written: 

X i} = p + a,- + e h ; i=l...a, j = 1 . . . n, 0, a). 

In words : 

Any observed value is the sum of three parts: (i) an overall mean, (ii) 
a treatment or class deviation, and (Hi) a random element from a normally 
distributed population with mean zero and standard deviation o. 

The artificial data in table 10.4.1 were made up according to this 
model. In Case II, with p t — 4, 5, 7, we have p = 16/3, = — 4/3, 

a 2 = — 1/3, a 3 = + 5/3. The e y were drawn from a table of normal de- 
viates with <r=l. 

This model is often called model I, the fixed effects model. Its dis- 
tinctive feature is that the effects of the treatments or classes, measured 
by the parameters oq, are regarded as fixed but unknown quantities to. be 
estimated. 

10.11 — Effects of errors in the assumptions. For the user of the analy- 
sis of variance, two relevant questions are : (i) Are the assumptions satis- 
fied in my data? (ii) Does it make any difference if they are not satisfied? 

Real data are seldom, if ever, exactly normally distributed. Often 
they exhibit some skewness; if symmetrical, they may have longer tails 
than the normal distribution. Three situations in which one should be 
on the lookout for non-normality are: (i) with small whole numbers, 
whose distribution may approximate the Poisson rather than the normal, 
i (ii) with proportions or percentages that cover a range extending nearly 
to zero or 100%, and (iii) cases in which the treatments (or classes) pro- 
duce multiplicative effects. Model I assumes that the effect of the z'th 
1 class is to add a f to any existing value. If, instead, the effect is to multiply 
I the existing value by, say, 60%, the observations are likely to approximate 
a distribution called the lognormal. This is a skew distribution of values 
X such that log X is normally distributed. 

In a single classification with equal n, various mathematical studies 
agree in showing that the F-test is little affected by moderate non-noranal- 
ity. However, with non-normal data, the variance a 2 within a class is 
often related to the mean p t of the class. For the Poisson distribution, 
you may recall that a 2 = /q. With a proportion, the variance may be- 
have like /q(l — pi), and with the lognormal distribution, of tends to 
vary as p 2 . It follows that with non-normal data, the use of a pooled 
estimate of error s 2 in comparing pairs or subgroups of means can be 
seriously misleading. With two treatments A and B thaf: produce small 
means, a 2 might be about 20, while with C and D, which give large means, 
a 2 is about 60. The pooled s 2 will be about 40. For comparing A with 
B, the pooled s 2 gives a /-value that is too small by a factor J2 — 1.41, 
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while for comparing C with D, t is too large by a factor J3J2. Heteroge- 
neous variance also occurs occasionally because some treatments by their 
nature produce erratic effects — sometimes they work well, sometimes not. 
Here there may be no clear relation between a- 2 and n t . 

When comparing two classes, a safe rule is to calculate s 2 from the 
data for these two classes only. The disadvantage is that the number of 
d.f. is reduced (see also section 4.14). With a single erratic treatment 
(the rth), a pooled s 2 can be calculated and used for comparisons among 
the remaining treatments, and a separate s 2 for the erratic one. The s.e. 
of (X t — X } ) is estimated as 

J(Si 2 + s 2 )/n 

When the relation between a 2 and is caused by non-normality, 
a knowledge of the type of data, plus a look at the relation between X ( 
and Ri (the range within the class) helps in deciding whether th e data areof 
the Poisson type (R t oc yJXi)> the quasi-binomial type (R t oc^d-II 
or the lognormal type, R t oc $ r For these three types, transformations will 
be given later (sections 1 1 . 14 — 1 1 . 17 ) that bring the data closer to normality 
and often permit the use of a pooled error variance for all comparisons. 


10.12 — Samples of unequal sizes. In planned experiments, the sam- 
ples from the classes are usually made of equal sizes, but in non-experi- 
mental studies the investigator may have little control over the sizes of 
the samples. As before, X l} denotes the Jih observation from the fth 
class. The symbol X t . denotes the class total of the X (J , while X. . = Y,X t . 
is the grand total. The size of the sample in the ith class is and N = 
is the total size of all samples. The correction for the mean is 

C = X.. 2 /N 

Algebraic instructions for the df and sums of squares in the analysis 
of variance appear in table 10.12.1. 


TABLE 10.12.1 

Analysis of Variance With Samples of Unequal Sizes 


Source of Variation 

Degrees of Freedom 

Sum of Squares Mean Square 

Between classes 

a- 1 

¥ 2 

Y. — -C s c 2 

^ n i 

Within classes 

N- a 

X 2 

Subtract = XIX,. 2 - V — s 2 

n > 

Total 

N - 1 

; 

XXX,/ - C 


The F ratio, s c 2 /s 2 , has (a - 1) and ( N — a) df The s.e. of the dif- 
ference between the zth and the Arth class means, with (N — a) df , is 
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The s.e. of the comparison I t A l X l is 



With unequal n v the F- and Mests are more affected by non-normality 
and heterogeneity of variances than with equal ft, (14). Bear this in mind 
when starting to analyze the data. 

EXAMPLE 10.12.1— The numbers of days survived by mice inoculated with three 
strains of typhoid organisms are summarized m the following frequency distributions. Thus, 
with strain 90, 6 mice survived for 2 days, etc. We have n x = 31 , n 2 = 60, « 3 = 1 33, N = 224. 
The purpose of the analysis is to estimate and compare the mean numbers of days to death 
for the three strains. 

Since the variance for strain 90 looks much smaller than that for the other strains, it 
seems wise to calculate s 2 separately for each strain, rather than use a pooled s 2 from the 
analysis of variance. The calculations are given under the table. 


Days to Death 

90 

Numbers of Mice Inoculated 

With Indicated Strain 

lie OSC1 

Total 

2 

6 

1 

3 

10 

3 

4 

3 

5 

12 

4 

9 

3 

5 

17 

5 

8 

6 

8 

22 

6 

3 

6 

19 

28 

' 7 

1 

14 

23 

38 

8 


11 

22 

33 

9 


4 

14 

18 

10 


6 

14 

20 

11 


2 

7 

9 

12 


3 

8 

11 

13 


1 

4 

5 

14 



1 

1 

Total 

31 

60 

133 

224 

2X 

125 

442 

1,037 

1,604 

2X 2 

561 

3,602 

8,961 

13,124 

", 

31 

60 

133 

224 

x, 

125 

442 

1,037 

1,604 

x, 

4.03 

7.37 

7.80 


sV 

561 

3,602 

8,961 

13,124 

x, 1 /n, 

504 

3,256 

8,085 


2(X tJ -X,) 2 

57 

346 

876 


s 2 

1 90 

5.86 

6.64 
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The difference m mean days to death for strains 1 1C and 9D is 3 34 days, with 
f \ 1-90 186] 

Se = slXW + l»\ = V°1591 = ±0399. 

For strains DSC 1 and 11 C, the difference is 0.43 days ± 0 384 

EXAMP LE 10.1 2.2 As an exercise, calculate the analysis of variance for the preceding 
data Show that F — 179 5/5.79 = 31.0,/= 2 and 221. Show that if the pooled s 2 were 
used, the s.e. of the mean difference between strains 1 1C and 9 D would be estimated as 
±0 532 instead of ±0 399 

10.13 — Model II. Random effects. With some types of single classi- 
fication data, the model used and the objectives of the analysis differ from 
those under model I. Suppose that we wish to determine the average 
content of some chemical in a large population or batch of leaves. We 
select a random sample of a leaves from the population. For each 
selected leaf, n independent determinations of the chemical content are 
made giving N = an observations in all. The leaves are the classes, and 
the individual determinations are the members of a class. 

In model II, the chemical content found for the /th determination 
from the /th leaf is written as 

X tJ = fi + A t H- £ lJ9 i = 1 . . . a, j = 1 . . . n, (10.13.1) 

where 

A x = ^(0, a A ); 0,<t) 

The symbol /i is the mean chemical content of the population of 
leaves. This is the quantity to be estimated. The symbol represents 
the difference between the chemical content of the /th leaf and the average 
content over the population. By including this term, we take account of 
the fact that the content varies from leaf to leaf. Every leaf in the popula- 
tion has its value of A v so that we may think of A t as a random variable 
with a distribution over the population. This distribution has mean 0, 
since the A, are defined as deviations from the population mean. In the 
simplest version of model II, it is assumed in addition that the A t are 
normally distributed with standard deviation a A . Hence, we have writ- 
ten A t = ^(0, <? A ). 

What about the term e tJ ? This term is needed because : 

(i) the determination is subject to an error of measurement, and 

(li) if the determination is made on a small piece of the leaf, its con- 
tent may differ from that of the leaf as a whole. The s tJ and the A x are 
assumed independent. The further assumption e tJ = Ji (0, c) is often 
made. 

There are some similarities and some differences between model II 
and model I. In model I 

X lJ = iJL + 0L i + e l7 , a, fixed, e tJ = . V{0 9 a) 
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Note the following points : 

(i) The (t t are fixed quantities to be estimated; the A t are random 
variables. As will be seen, their variance a 2 is often of interest. 

(ii) The null hypothesis a ( = 0 is identical with the null hypothesis 
c A = 0, since in this event all the A t must be zero. Thus, the Ftest holds 
also in model II, being now a test of the null hypothesis a A — 0. 

(iii) We saw (in section 10.4), that when the null hypothesis is false, 
the mean square between classes under model I is an unbiased estimate of 

E(M.S. Between) = a 2 + nhcc t 2 /(a — 1) (10.13.2) 

There is an analogous result for model II, the mean square estimating 
E(M.S. Between) = a 2 + na A 2 (10.13.3) 

Neither result requires the assumption of normality. 

(iv) In drawing repeated samples under model I, we always draw 
from the same set of classes with the same Under model II, we draw a 
new random sample of a leaves. A consequence is that the general dis- 
tributions of F (when the H 0 is false) differ. With model I, this distribu- 
tion, the power function, is complicated: tables by Tang (15) and charts 
by Pearson and Hartley (16) are available. With model II, the prob- 
ability that the observed variance ratio exceeds any value F 0 is simply 
the probability that the ordinary F exceeds F 0 /( 1 + na A 2 /a 2 ). 

To turn to an example of model II, the data for calcium in table 
10.13.1 come from a large experiment (17) on the precision of estimation 
of the chemical content of turnip greens. To keep the example small, 
we have used only the data for n = 4 determinations on each of a — 4 
leaves. In the analysis of variance (shown below table 10.13.1), the mean 
square between leaves s L 2 is an unbiased estimate of a 2 + na A = a 2 
+ 4a A . Consequently, an unbiased estimate of <j a 2 is 

s A 2 = (s L 2 - s 2 )/4 = (0.2961 - 0.0066)/4 = 0.0724 

The quantity a 2 is called the component of variance for leaves. The 
value of F — 0.2961/0.0066 = 44.9 (highly significant with 3 and 12 d.f.) 
is an estimate of (a 2 + 4o A 2 )/o 2 . 

We now consider the questions: (i) How precisely has the mean 
calcium content been estimated ? (ii) Can we estimate it more economical- 
ly? With n determinations from each of a leaves, the sample mean X.. 
is, from equation 10.13.1 for model II, 


X . . — p -f- A. -f* £.. , 


where A. is the mean of a independent values of A, (one for each leaf), 
and £., is the mean of an independent e t] . Hence the variance of X.. as 
an estimate of p is 


V(X..) 


ij* = g 2 + no/ 
a an 


+ w 


an 


16 


(10.13.4) 
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TABLE 10.13.1 

Calcium Concentration in Turnip Greens 
(Per cent of dry weight) 


Leaf 1 

Per Cent of Calcium 

Sum 

i 

Mean 

1 


3.28 

3.09 

3.03 

3.03 


12.43 

3.11 

2 


3.52 

3.48 

3.38 

3.38 


13.76 

3.44 

3 


2.88 

2.80 

2.81 

2.76 


11.25 

2.81 

4 


3.34 

3.38 

3.23 

3.26 


13.21 

; 3.30 

Source of Variation 

Degrees of Freedom 

Mean Square 

Parameters Estimated 

between leaves 


3 


0.2961 


c 

r 2 + 4<r^ 2 

Determinations 


12 


0.0066 


c 

r 2 


s 2 = 0.0066 estimates a 2 . s A 2 = (0.2961 — 0.0066)/4 = 0.0724 estimates of 


In the analysis of variance, the mean square between leaves, 0.2961, 
s an unbiased estimate of (<x 2 + 4<7 /1 2 ). Hence, V{X..) = (0.2961)/16 
= 0.0185. This is an important result. The estimated variance of the 
ample mean is the Between classes mean square, divided by the total 
lumber of observations. 

Suppose that the experiment is to be redesigned, changing n and a to 
i' and a'. Asin equation 10.13.4, the variance of X.. becomes 


V\X..) 


a A 2 a 2 0.0724 0.0066 

-j* -f* . 1 

a' a'n' a! a'ri 


vhere the — sign means “is estimated by.” Since the larger numerator 
s 0.0724, it seems clear that a' should be increased and ri decreased if this 
s possible without increasing the total cost of the experiment. If a de- 
ermination costs 10 times as much as a leaf, the choice of n' = 1 and 
i r = 15 will cost about the same as^our original data. For this new 
iesign our estimate of the variance of X. . is 


V\X..) 


0.0724 

~l5~ 


+ 


0.0066 

15 


= 0.0053 


The change reduces the variance of the mean from 0.0185 to 0.0053, i.e., 
to less than one-third. This is because the costly determinations with 
small variability have been utilized to sample more leaves whose variation 
s large. A formula for determining the best values of a ' and n! in a given 
:ost situation will be found in sections 17.11 and 17.12. 

With model II, the difference (X r tJ — p) between a'smgle observation 
and the population mean is the sum of the two terms A l and e tJ . Hence, 
the variance of X tj is (<r A 2 + a 2 ). The two parts are called the components 
if variance . The previous example illustrates how these components are 
used in problems of measurement, the objective being to estimate p as 
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economically as possible. In plant breeding, n replications of each of a 
inbred lines may be grown m an experiment The component o A 
represents differences m yield that are due to differences m the genotypes 
(genetic charactenstics) of the mbreds, while er 2 measures the effect of 
non-genetic influences on yield. The ratio o A 2 /(p A 2 + c 2 ) of genetic to 
total variance gives a guide to the possibility of improving yield by selec- 
tion of particular mbreds The same concepts are important m human 
family studies, both in genetics and the social sciences, where the ratio 
a A 2 l(a A 2 + a 2 ) now measures the proportion of the total variance that 
is associated with the family The interpretation is more complex, how- 
ever, since human families differ not only in genetic traits but also m 
environmental factors that affect the variables under study 

EXAMPLE 10 13 1— The following data were abstracted from records of performance 
of Poland China swine in a single inbred line at the Iowa Agricultural Experiment Station 
Two boars were taken from each of four litters with common sire and fed a standard ration 
from weaning to about 225 pounds Here are the average daily gams 


Litter 

1 

2 

3 

4 

Gams 

1 18 

1 36 

1 37 

1 07 


1 11 

1 65 

1 40 

0 90 


Assuming that the litter variable is normally distributed, show that o A differs significantly 
from zero (F = 7 41) and that 0 0474 estimates it 

EXAMPLE 10 13 2— There is evidence that persons estimating the crop yields of fields 
by eye tend to underestimate high yields and overestimate low yields If so, and if two 
estimators make separate estimates of the yields of each of a number of fields, what will be 
the effect on (i) the model II assumptions, (n) the estimate s A 2 of the vanance a A 2 between 
fields, (in) the estimate s 2 of <r 2 9 

EXAMPLE 10 13 3 — To prove the result (10 13 3) for the expected value of the mean 
square between classes, show that under model II, 

(J, - X ) = (A, — A) + (e, - e ) 

m zl )2 _ m - *> 2 - « ) 2 . 2 i(A t ~ ~ * > 

(a - 1) (a - 1) (a - 1) (a - l) 

where X t is the mean of the n determinations in class /, and X is the overall sample mean 
If a random sample of leaves has been drawn, the first term on the right is an unbiased 
estimate of a A 2 , and the second of cr 2 /n, since e, is the mean of n independent determinations 

The third term vamshes, on the average m repeated sampling, if the A x and e tJ are inde- 
pendent Multiplying by n to obtain the mean square between classes, the result follows 
See if you can obtain the corresponding result (10 13 2) for model I 

10.14 — Structure of mode! II illustrated by sampling. It is easy to 
construct a model II experiment by sampling from known populations 
One population can be chosen to represent the individuals with vanance 
a and another to represent the variable class effects with variance a A 2 , 
then samples can be drawn from each and combined in any desired 
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TABLE 10 14 1 

Gains in Weight of 20 Pigs in Tfn Litters of Two Pigs Each 
(Each gain is the sum of three components The component for litters is a sample 
with a/ = 25, that for individuals is from table 3 2 I witfftr 2 = 100 } 


Litter 

Litter Component 

Number A t 

Pig 

Component 

% 

Sample of 

Pig Gams 

Kj “ » + 4- tij 

Sample of 
Litter Gams 

a) 

(2) 

(3) 

14) « 30 4- (2) + (3) 

(5) 

1 

- 1 

7 

36 




9 

38 

74 

2 

2 

- 4 

28 




-23 

9 

37 

3 

- 1 

0 

29 




19 

48 

77 

4 

0 

2 

32 




2 

32 

64 

5 

- 4 

3 

29 




12 

38 

67 

6 

-10 

9 

29 




3 

23 

52 

7 

10 

5 

45 




- 4 

36 

81 

8 

2 

-19 

13 




-10 

22 

35 

9 

4 

- 4 

30 




18 

52 

82 

10 

- 2 

15 

43 




- 6 

22 

65 

— - n 

Source ot Variation 1 

Degrees of Freedom 

Mean Square Parameters Estimated 

Litters 


9 

144 6 

o 2 4- 2g a 2 

Individuals 


10 

96 5 

a 2 


s 2 = 96 5 estimates 100 — (144 6 — 96 5)/2 = 24 0 estimates 25 


proportion In table 10 14 1 is such a drawing The-sample consists 
of two pigs from each ol ten litters the litters simulating random class 
effects Individual pig gains w£re taken from table 3 2 1 with a 1 = 100, 
two of these per litter The litter components were drawn from a popula- 
tion with a A 2 = 25 (table 3 10 1 in the fifth edition of this book) 
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The usual analysis of variance is computed from table 10,14.1, then 
the components 6f variance are separated. From the 20 observations we 
obtained estimates s 2 = 96.5 of a 2 = 100 and s A = 24.0 of <j a 2 = 25, 
the two components that were put into the data. 

This example was chosen because of its accurate estimates. An idea 
of ordinary variation can be got from examination of the records of 25 
similar samples in table 10,14.2. One is struck immediately by the great 
variability in the estimates of a A 2 , some of them being negative! These 
latter merely indicate that the mean square for litters is less than that for 
individuals; the litters vary less than random samples ordinarily do if 
drawn from a single, normal population. Clearly, one cannot hope for 
accurate estimates of a 2 and a 2 from such small samples. 


TABLE 10.14.2 

Btimates of <r A 2 - 25 and a 1 = 100 Made From 25 Samples Drawn Like 
That of Table 10.14.1 


Sample 

Estimate of 

i 

Estimate of j 

Sample 

Estimate of 

Estimate of 

Number 

= 25 

cr 2 = 100 

l 

Number 

a A — 25 

8 

ii 

b 

1 

60 

127 : 

14 

56 

112 

2 

56 

104 

15 

-33 

159 

3 

28 

97 

16 

67 

54 

4 

6 

91 

17 

-18 

90 

5 

18 

60 

18 

33 

65 

6 

- 5 

91 

19 

-21 

127 

7 

7 

53 

20 

-48 

126 

8 

- 1 

87 

21 

4 

43 

9 

0 

66 

22 

3 

145 

10 

—78 

210 

23 

49 

142 

11 

14 

148 

24 

75 

23 

12 

13 

7 

68 

162 

76 

25 

77 

106 

Mean 

17.0 

102.6 


EXAMPLE 10 14 1 — In table 10 14.2, how many negative estimates of <j A 2 would be 
expected 9 Ans. A negative estimate occurs whenever the observed F < 1. From section 
10.13, the probability that the observed F< 1 is the probability that the ordinary 
F< 1/(1 + 2a A 2 ?a 2 ), onn this example, < 1/1.5 •== 2/3, where F has 9 and \®d.f. Aproperty 
of the F distribution is that this probability is the probability that with 10 and 9 d.f., 
exceeds 3/2, or 1 5 From table A 14, with); = 10, f 2 ~ 9, we see that F exceeds 1.59 with 
F ~ 0 25. Thus about (0 25)(25) = 6.2 negative estimates are expected, as against 7 found m 
table )0. 14.2 

10-15 — Confidence limits for cr A 2 . Assuming normality, approxi- 
mate confidence limits for a A 2 have been given by Monguti (18). We 
shall illustrate from the turnip greens example (table 10.13.1) for which 
n = 4,/i = 3,/ 2 = 12, s A z — 0.0724, and s 2 = 0.0066. It is necessary to 
look up four entries in the /’-table. If the table of 5% significance levels 
is used, these determine a two-tailed 90% confidence interval, with 5% on 
"ach tail. The 5% values of /needed are as follows : 
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F f i,f 2 ** ^3,12 

= 3.49 

*/i.. ^3,00 

- 2.60 

F fiJi = ^12,3 

= 8.74 

*•./» “ ^oo,3 

= 8.53 


= observed value of F = 44.9 


Tb nrvrfi^ t£ ? VCn P^tiplieis of the quantity s 2 fn = (0.0066)/4 
= 0.00165. The lower limit for er 2 is 


2 (F-F 1 )(F+F 1 -F 2 ) s 2 


»a r. 2 = 


FF, 


= ( 44 - 9 ~3-49)(44.9 + 3.49 - 2.60) 


(44.9) (2.60) 

• (41.41)(45.79) 

(44.9) (2.60) ( 0 -°° 165 ) = °- 027 


(0.00165) 


As would be expected, the lower limit becomes zero if F = F, : that is if 
F is just significant at the 5% level. 

The uppeij limit is 


ft 2 
@AU 



(h ~ Fa) 

ff 3 2 



= {(44.9) (8.53) - 1 + (0.2 1 )/(44.9)(8.74) 2 } (0.001 65) = 0.63 

Frequently, as in this example, the rather unwieldy second term inside the 
curly bracket is negligible and need not be computed. 

To summarize, the estimate is s A 2 = 0.0724, with 90% confidence 
limits 0.027 and 0.63. Earlier, Bross (19) gave approximate fiducial limits, 
using the same five values of F. His limits agree closely with the above 
limits whenever F is significant. 

If the distributions of A, and £ (J are non-normal, having positive 
kurtosis, the variance of s A 2 is increased, and the above confidence inter- 
vals are too narrow. 


EXAMPLE 10.15.1 — In estimating the amount of plankton in an area of sea, seven 
runs (called hauls) were made, with six nets on each run (20). Estimate the component of 
variance between hauls and its 90% confidence limits. 



Degrees of Freedom 

Mean Square 

Between hauls 

6 

0.1011 

Within hauls 

35 

0.0208 


Ans. s A 2 = 0.0134, with limits (0.0044, 0.058). 

10.16 — Samples within samples. Nested classifications. Each sample 
may be composed of sub-samples and these in turn may be sub-sampled, 
etc. The repeated sampling and sub-sampling gives rise to nested or 
hierarchal classifications , as they are sometimes called. 
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In table 10 J 6.1 is an example. This is a part ol the turmp greens 
experiment cited earlier (17). The four plants were taken at random, then 
three leaves were randomly selected from, each plant. From each leaf 
were taken two samples of 100 mg. in which calcium was determined by 
microchemical methods. The immediate objective is to separate the 
sums of squares due to the sources of variation, plants, leaves of the 
same plant, and determinations on the leaves. 

The calculations are given under table 10.16.1. The total sums of 
squares for determinations, leaves, and plants are first obtained by the 
usual formulas. The sum of squares between leaves of the same plant is 
found by subtracting the sum of squares between plants from that be- 
tween leaves, as shown. Similarly, the sum of squares between determina - 


TABLE 10.16.1 

Calcium Concentration (Per Cent, Dry Basis) in b = 3 Leaves From Each of 
o = 4 Turnip Plants, n = 2 Determinations Per Leaf. Analysis of Variance 


Plant, i 
i=l. . a 

Leaf, ij 
7=1 ..b 

Determinations, X ijk 

% X. . 

\ 

1 

3.28 3 09 

6.37 


2 

3.52 3.48 



3 

2.88 2.80 

5.68 19.05 

2 

1 

2.46 2.44 

4.90 


2 

1.87 1.92 

3.79 


3 

2.19 2.19 

4.38 13.07 

3 

1 

2.77 2.66 

5.43 


2 

3.74 3.44 

7.18 


3 

2.55 2.55 

5.10 17.71 

4 

1 

3.78 3 87 

7 65 


2 

4.07 4.12 

8.19 


3 

3.31 3 31 

6 62 22 46 72.29 

1 


Total Size = abn = (4)<3j(2) = 

24 determinations 


C = (X ) 2 !abn — (72 29) 2 /24 = 217 7 435 

Determinations 2 Y lJk 2 - C - 3 28“ + . r 3,31 2 - C = 10.2704 

Leaves ZX tJ 2 /n - C = (6 37 2 + + 6 62 2 )/2 - C = 10.1905 

Plants !X t 2 jbn - C= (19 05 2 + + 22 46 2 )/6 - C = 7.5603 

Leaves of the same plant = Leaves - Plants = 10.1905 - 7.5603 = 2.6302 

Determinations on same leaf = Determinations - Leaves = 10 2704 - 10.1905 = 0.0799 


Source of Variation 


Degrees of Freedom 


Plants 

Leaves m plants 
Determinations in leaves 


3 

8 

12 


23 


Sum of Squares 


7 5603 
2.6302 
0.0799 


10.2704 


Mean Square 


2.5201 

0.3288 

0.0067 


Total 
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tions on the same leaf is obtained by deducting the total sum of squares 
between leaves from that between determinations. This process can be 
repeated with successive sub-sampling. 

The model being' used is, 

Xuk = h + A + B ij + E ijk » »' = 1 . . . a, j = 1 . . . b, k = 1 . . . n, 

A, = jV( 0, a A ), B,j = Jf( 0, a B ), e IJk = jf( 0, (10.16.1) 

where A refers to plants and B to leaves. The variables A„ B u , and s i]k 
are all assumed independent Roman letters are used to denote plants 
and leaves because they are random variables, not constants. 


TABLE 10.16 2 

Completed Analysis of Variance of Turnip Greens Data 


Source of Variation 

Degrees of Freedom 

Mean Square 

Parameters Estimated 

Plants 

3 

2 5201 

<j 2 + m M 2 -F bm 7 A 2 

Leaves in plants 

8 

0 3288 

g 2 +• na B 2 

Determinations m leaves 

12 

0.0067 

a 2 


*-2, b = 3, a = 4 s 2 = 0.0067 estimates a 2 . s B 2 = (0.3288 ~0.0067)/2 = 0.16 10 estimates 
<r g 2 s A 2 ~ (2 5201 - 0 3288)/6 = 0 3652 estimates a A 2 . 


In the completed analysis of variance, table 10.16.2, the components of 
variance are shown. Each component in a sub-sample is included among 
those in the sample above it. The estimates are calculated as indicated. 
Null hypotheses which may be tested are ; 


W — 0; F = 
2. o B = 0; F = 


2.5201 . a 1 -f m B 2 + bm A 2 

= 7.66 estimates ^-rrrT ~ ’ /= 3, 8. 

0.3288 
0.0067 


a 1 + na B 1 
2 2 

= 49 estimates ^ - + ”° - fi ■ ■ / = 8, 12. 


For the first, with degrees of freedom, / ( = 3 and f 2 = 8, F is almost on 
its 1% point, 7.59; for the second, with degrees of freedom 8 and 12, F is 
far beyond its 1% point, 4.50. Evidently, in the sampled population the 
per cent calcium varies both from leaf to leaf and from plant to plant. 

As with a single sub-classification (plants and leaves in section 1 0. 1 3), 
it may be shown that the estimated variance of the sample mean per 
determination is given by the mean square between plants , divided by the 
number of determinations. This estimated variance can be expressed in 
terms of the estimated components of variance from table 10.16.2, as 
follows : 


sy = 


2.5201 

24 


= 0.105 = 


0.0067 + «(0.1610) + MO-3652) 

nab 

0.0067 0.1610 0.3652 

+ __ — + 


nab 


ab 


a 
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This suggests that more information per dollar may be got by decreasing 
n, the number of expensive determinations per leaf which have a small 
component, then increasing b or a, the numbers of leaves or plants. Plants 
presumably cost more than leaves, but the component is also larger. How 
to balance these elements is the topic of section 17.12. 

Confidence limits for a A and a B 2 are calculated by the method 
described in section 10.15. 

EXAMPLE 10.16.1 — Verify that- the sum of squares for Determinations in leaves, as 
found by subtraction in table 10.16.1, is the sum of squares of deviations of the determina- 
tions from their respective leaf means.- Ans. Since the C term cancels, Determinations 
- Leaves is equal to 

III V - 1 1 V/» -III - ^r) 1 

lit • J ‘I* 

by the usual shortcut rule for finding a sum of squares of deviations, where X tJ . is the mean of 
the n determinations on the 7 th leaf of the zth plant. 

EXAMPLE 10.16.2 — From equation 10.16.1 for the model, show that the variance of 
the sample mean is {a 2 + nc B 2 4 * bncr A 2 )labn, and that an unbiased estimate of it is given by 
the mean square between plants, divided by abn , , i.e., by 2.5201/24 ~ 0.105, as stated in 
section 10,16. 

EXAMPLE 10.16.3 — If one determination were made on each of two leaves from each 
of ten plants, what is your estimate of the variance of the sample mean? Ans. 0.045. 

EXAMPLE 10.16.4 — With one determination on one leaf from each plant, how many 
plants must be taken in order to reduce s% to 0.2? Ans. About 14. (This estimate is very 
rough, since the mean square between plants has only 3 d.f.) 

10,17 — Samples within samples. Mixed model. In some applications 
of sub-sampling, the major classes have fixed effects that are to be esti- 
mated. An instance is an evaluation of the breeding value of a set of five 
sires in pig-raising. Each sire is mated to a random group of dams, each 
mating producing a litter of pigs whose characteristics are the criterion. 
The model is: 

%ijk = + B i3 -F e ijk (10.17.1) 

The a t are constants (Eoq = 0) associated with the sires but the B tj and 
the e lJk are random variables corresponding to dams and offspring. Hence 
the model is called mixed. 

Table 10.17.1 is an example with & = 2 dams for each sire and n = 2 
pigs chosen from each litter for easy analysis (from records of the Iowa 
Agricultural Experiment Station). The calculations proceed exactly as in 
the preceding section. The only change is that in the mean square for 
sires, the term nbK 2 , where k 2 = la 2 /(a - 1 ), replaces nba A 2 . 

In a mixed model of this type, two points must be noted. From equa- 
tion 10.17.1, the observed class mean may be written 

X r . = pi + QL t + B r + e r . 

where B t is the average of b values of the B tJ and i r . is the average of nb 
values of the s t j k . Thus the variance of X t . . ’ considered as an estimate of 

M + 2 { , IS 



TABLE 10.17.1 

Average Daily Gain of Two Pigs of Each Litter 
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Sire 

Dam 

Pig Gams 

Sums 

1 

I 

2.77 2.38 

5.15' 


2 

2.58 2.94 

5.52 10.67 

2 

1 

2.28 2.22 

4.50 


2 

3.01 2.61 

5.62 10.12 

3 

1 

2.36 2.71 

5.07 


2 

2.72 2.74 

5.46 10.53 

4 

1 

2.87 2.46 

5.33 


2 

2.31 2.24 

4.55 9.88 

5 

1 

2.74 2.56 

5.30 


2 

2.50 2.48 

4.98 10.28 5L48 


Source of Variation 


Degrees of Freedom 


Sires 

Dams-Same Sire 
Pairs-Same Dam 


4 

5 
10 


Mean Square 


0.0249 

0.1127 

0,0387 


Parameters Estimated 


a 2 + w/ + nine 2 
a 2 4 - na R 2 
a 2 


n ss 2, b = 2, s 2 — 0.0387 estimates o 2 , s B 2 — (0.1127 ~ 0.0387)/2 = 0.0370 estimates cr B 2 , 
0 estimates k: 2 


To test <r B 2 = 0, F= 0.1 127/0.0387 = 2.91, F 0 05 = 3.33. 


The analysis of variance shows that the mean square between dams of 
the same sire is the relevant mean square, being an unbiased estimate of 
(a 2 + nc t b 2 ). The standard error of a sire mean is ^(0.1 127/4) = 0.168, 
with 5 d.f Secondly, the F ratio for testing the null hypothesis that all 
a t are zero is the ratio 0.0249/0.1 127. Since this ratio is substantially less 
than 1 , there is no indication of differences between sires in these data. 

10.18— Samples of unequal sizes. Random effects. This case occurs 
commonly in family studies in human and animal genetics and in the 
social sciences. The model being used is a form of model II : 

Xij = ji + Ai + £ y , / = 1 , . . . a> j = 1, . . . n iy A { = Jr (0, a A ), = ^(0, cr) 

The new feature is that n h the size of sample of the ith class, varies from 
class to class. The total sample size is N = All A t and are assumed 
independent. 

The computations for the analysis of variance and the F- test of the 
null hypothesis < 7 ^ = 0 are the same as for fixed effects, as given in section 
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10.12. With equal «,<=«)> mean square between classes was found to 
be an unbiased estimate of g 2 + no A (section 10.13). With unequal n„ 
the corresponding expression is a 2 + n 0 G A 2 , where 

The first equation is the form used for computing n 0 . The second equa- 
tion shows that n 0 is always less than the arithmetic mean n of the n„ 
although usually only slightly less. 

Consequently, if s 2 and s 2 are the mean squares between and within 
classes, respectively, unbiased estimates of the two components of variance 
a 2 and a A are given by 

o 2 = s 2 : g A = (s b 2 - s 2 )/n 0 

With unequal n„ some mathematical complexities arise that have not 
yet been overcome in a form suitable for practical use. The estimate 
& A 2 , while unbiased whether the A t and s 0 are normally distributed or 
not, is not fully efficient unless a A 2 is small. The method given for finding 
confidence limits for g 2 with equal n (section 10.1 5) does not apply. An 
ingenious method of finding confidence limits for the ratio g 2 jo 1 was, 
however, given by Wald (21). Whenever feasible, it pays to keep the 
sample sizes equal. 

EXAMPLE 10.18.1 — In research on artificial insemination of cows, a senes of semen 
samples frotn a bull are sent out and tested for their ability to produce conceptions. The 
following data from a larger set kindly supplied by Dr. G. W. Salisbury, show the per- 
centages of conceptions obtained from the samples for six bulls. In the analysis ot vanance, 
the total sum of squares, uncorrected, was 1 1 1 ,076. Venfy the analysis of variance, the value 
of n Q , and the estimates of the two vanance components. (Since the data are percentages 
based on slightly diffenng numbers of tests, the assumption that o 2 is constant m these data 
is not quite correct ) 


Bull (i) 

Percentages of Conceptions to Services 
for Successive Samples 

n i 


1 

46,31,37, 62, 30 

5 

206 

2 

70, 59 

2 

129 

3 

52, 44, 57, 40, 67, 64, 70 

7 

394 

4 

47,21,70,46, 14 

5 

198 

5 

42,64,50,69,77,81,87 

7 

470 

6 

35, 68, 59, 38, 57, 76, 57, 29, 60 

9 

479 


Total 35 1876 


Source 

. d.f 

SS 

MS. E(MS.) 

Between bulls 

5 

3,772 

754 cr 2 + 5.67« a 2 

Within bulls 

29 

6,750 

233 a 2 


■ 233 estimates c 2 * 

(754 - 233)/5.67 = 

92 estimates a 2 
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EXAMPLE 10 18 2— The preceding example is one in which we might consider either 
fixed or random effects of bulls, depending on the objectives. If these six bulls were available 
for an artificial insemination program, we would be interested m comparing ’the percentages 
of success of these specific bulls m a fixed effects analysis. 

10.19 — Samples within samples. Unequal sizes. Both samples and 
sub-samples may be of unequal sizes. Computational methods for any 
number of levels (samples, sub-samples, sub-sub-samples, etc.) have been 
developed by Gower (22) and by Gates and Shine (23), following earlier 
work by Ganguli (24): The analysis of variance is straightforward al- 
though tedious. A general procedure for finding unbiased estimates of 
the components of variance at each level will be given. 

Our example is from a small survey of wheat yields in six districts in 
England (25). One or more farms were selected in each district, and from 
one to three wheat fields from each selected farm. Strictly, this is a mixed 
model, since the districts are fixed; further, the farms within districts were 
not randomly selected. The data serve, however, to illustrate the com- 
putations. 

The computations are most easily followed if the data are set out as 
in table 10.19.1. The lowest level (fields) is denoted by 0. The yield, 
X 0k , and the number of observations in each yield are written down. 
In this example, as in most applications, the N 0k are all 1, each observa- 
tion being the yield of one field. 

The X 0k and the N 0k are added to give the totals, x ik and iVj k , at the 
next lowest level, farms. Similarly, the and the N n are added to give 
the district totals, X 2k and N 2k . Finally, the district totals are added to 
give X 3k and N 3k , the grand total and the total number of recorded ob- 
servations, respectively. 

To obtain the sum of squares in the analysis of variance, first calculate 
for each level the quantity 


S. = I X ik 2 /N lk 

k 

S 3 , for instance, is (1063) 2 /36, = 31,388.0, the usual correction for the mean. 
At Level 2 (Districts) we have 

S 2 = 1 10 2 /4 + 91 2 /3 + . . . + 432 2 /13 = 31,849.3 

To obtain the d.f., count the number of classes C t at each level. These are 
C 0 = 36, C x = 25, C 2 — 6, C 3 = 1, as shown at the foot of table 10.19.1. 
The C, and the S, provide the d.f. and the sums of squares in the analysis of 
variance, as shown in table 10.19.2 on p. 293. 

The rule for calculating the d.f. and the sums of squares is a straight- 
forward extension of the rule for two levels given in table 10.12.1. 

We now express the expected values of the three mean squares in 
terms of the components of variance for districts (<r 2 2 ), farms (<r, 2 ), and 
fields (<r 0 2 ). For this we use two sets of auxiliary quantities, y i} and k tJ . 
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TABLE 10.19.1 

Wheat Yields (gms. per 0.0000904 Acre) to Illustrate Estimation of Components 
of Variance in Nested Classifications With Unequal Numbers 


Level 0 

Fields 

-^Ok Wok 

Level 1 

Farms 

*» AT u 

*2 4 

Level 2 
Districts 

^lk 

Level 3 

Grand Total 

x* N 3k 

23 

1 






19 

1 

42 

2 




31 

1 






37 

1 

68 

2 

110 

4 


33 

1 






29 

1 

62 

2 




29 

1 

29 

1 

91 

3 


36 

1 . 






29 

1 






33 

1 

98 

3 

98 

3 


11 

1 




1 


21 

1 

32 

2 


i 


23 

1 






18 

1 

41 

2 




33 

1 

33 

1 




23 

1 

23 

1 




26 

1 

26 

1 




39 

1 

I 39 

1 




20 

1 

20 

1 




24 

1 

24 

1 




36 

1 

36 

1 

274 

11 


25 

1 


1 




33 

1 

58 

2 

58 

2 


28 

1 






31 

1 

59 

2 




25 

1 






42 

1 

67 

2 




32 

1 






36 

1 

68 

2 




41 

1 

41 

1 




35 

1 

35 

1 




16 

1 

16 

1 




30 

1 

30 

1 




40 

i ; 

40 

1 




32 

i 

32 

1 




44 

i 

44 

1 

432 

13 

1063 36 

C, 

36 

25 

6 

1 


For the y tJ , i and/' take the values. 0, 1 , 2, 3, with i > j. In the diagonal, 
Ju always equals the total number of observations, in this case 36. Further, 
when all N 0k are 1, y [0 = C„ the number of classes at level i. Thus, we 
write 1 , 6, 25, and 36 in the column y l0 in table 10.19.3. For the remaining 
y tJ , the rule is (using table 10.19.1): 

Sum the squares of the N jk , each square divided by the next entry N lk 
at level i. It sounds puzzling but should be clear from the examples. 




TABLE 10.19.2 

Analysis of Variance of Wheat Yields 
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Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Squares 

Districts (level 2) 

C 2 -C 3 = 5 

461.3 

92.3 

Farms within districts (level 1) 

C t - C 2 = 19 

S, - S 2 = 1,349.5 

71.0 

Fields within farms (level 0) 

C 0 — = 1 1 

S 0 ~S X ~ 310.2 

28.2 


TABLE 10.19.3 

Values of Auxiliary Quantities y tJ and k u 



2 2 + 2 2 2 2 + l 2 3 2 2 2 + 2 2 + ... + 1 2 + 1 2 i 

V 21 = — + — + T+ ... + ^ =11-49 

y 32 = (4 2 + 3 2 + 3 2 + ll 2 + 2 2 + 13 2 )/36 = 9.11 


For the k tj , i and j take the values 0, 1, 2, with i > j, and 
Kj = y,j - 7i+ u 

That is, to find any k xj , start with y tj and subtract the number immediately 
above it. Thus, k 22 = 36 — 9.11 — 26.89. 

The quantity k xj is the coefficient of cr/ in the expected value of the 
sum of squares at level i in the analysis of variance. To find the expected 
values of the corresponding mean squares, divide by the number of df 
at level /. These mean squares (from table 10.19.2) and their expected 
values appear in table 10.9.4. For example, the coefficient 1.290 of o x 2 
in the farms mean square is k x J 19 = 24.51/19, and so on. 


TABLE 10.19.4 

Expected Values of the Mean Squares 


Level 

Degrees of Freedom 

Mean Square 

Expected Value 

Districts (/ = 2) 

5 

92 3 

<r 0 J + 1.964c, 2 + 5.378oy 

Farms 0=1) 

19 

71.0 

<r 0 2 + 1 290.T, 2 

Fields 0 = 0) 

11 

28.2 

Oo 1 


A new feature is that the coefficient of 2 is no longer the same in 
the Districts and Farms mean squares. Thus, the ratio 92.3/71.0 cannot 
be used as an F-test of the null hypothesis a 2 2 = 0. However, unbiased 
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estimates of the three components are obtained from table 10.19.4 as 
follows: 

V = 28.2 : ft 2 = (71.0 - 28.2)/1.290= 33.2 
s 2 2 = [92.3 - 28.2 - (1. 964)(33.2) ]/5.378 = -0.02 

The data give no evidence of real differences in yield between districts. 

This method of calculation holds for any number of levels. For 
large bodies of data the computations may be programmed for an elec- 
tronic computer. 

10.20 — Intraclass correlation. We revert to a single classification 
with n members per class. When the component o A 2 > 0, we have seen 
that members of the same class tend to act alike. An alternative to model 
II for describing this situation is to suppose that the observations X tJ are 
all distributed about the same mean p with the same variance a 2 , but that 
any two members of the same class (i = constant) have a common cor- 
relation coefficient ft, called the intraclass correlation coefficient. Actually, 
this model antedates the analysis of variance. 

With this model it can be shown by algebra that the expected values 
of the mean squares in the analysis of variance are as follows : 


Source of Variation 

Mean Square 

Expected Value 

Between classes 

•% 2 

c 2 {l + (n- l )p,} 

Within classes 

Sh . 2 

tf 2 0 ~ Pi) 


This model is useful in applications in which it is natural to think of mem- 
bers of the same class as correlated. It is frequently employed in studies of 
twins ( n = 2). The model is more general than the components of 
variance model. If p, is negative, note that s 2 has a smaller expected 
value than s w 2 . With model II, this cannot happen. But if, for instance, 
four young animals in a pen compete for an insufficient supply of food, 
the stronger animals may drive away the weaker and may*regularly get 
most of the food. For this reason the variance in weight within pens 
may be larger than that between pens, this being a real phenomenon 
and not an accident of sampling. We say that there is a negative correla- 
tion ft between the weights within a pen. One restriction on negative 
values of p, is that ft cannot be less than - l/(n - 1). This is so because 
the expected value of s 2 must be greater than or equal to zero. 

From the analysis of variance it is clear that (s b 2 - s w 2 ) estimates 
npjG 2 , while {V + (n~ 1> W 2 } estimates m 2 This suggests that as an 
estimate of p, we take 

r t = (V - 5„ 2 )/{5(, 2 + (n - IK, 2 } (10.20.1) 

As will be seen presently, a slightly different estimate of ft is obtained 
when we approach the problem from the viewpoint V correlation. 

The data on identical twins in table 1 0.20. 1 illustrate a high positive 
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TABLE 10.20.1 

Number of Finger Ridges on Both Hands of Individuals in 12 Pairs 
of Female Identical Twins 


[Data from Newman, Freeman, and Holzmger (34) J 


Finger Ridges 


Fmger Ridges 


Finger Ridges 

Pair 

of Individuals 

Pair 

of Individuals 

Pair 

of Individuals 

1 

71, 71 

5 

76, 70 

9 

114, 113 

2 

79, 82 

6 

83, 82 

10 

94, 91 

3 

105, 99 

7 

114, 113 

11 

75, 83 

4 

115, 114 

8 

57, 44 

12 

76, 72 

Analysis of Variance 

Source of Variation 

Degrees of Freedom 

Mean Square 

Twjn pairs 



11 


817.31 

Individuals 



12 


14.29 


s 2 = 

14.29, sj 

= 401.51, = 0.966 



correlation. The numbers of finger ridges are nearly the same for the two 
members of each pair but differ markedly among pairs. From the analysis 
of variance, the estimate of p 7 is (n = 2)’ 

rj = ( 817.31 - 14 . 29 )/( 8 17.31 + 14 . 29 ) = 0.966 

In chapter 7 , the ordinary correlation coefficient between X and Y 
was estimated as 

r = £( X- X)W - T)/V{£(* - X) 2 £(y - f} 1 } 

With twin data, which member of a pair shall we call X and which 
y? The solution is to count each point twice, once with the first member 
of a pair as X, and once with the first member as Y. Thus, pair 2 is entered 
as (79, 82) and also as (82, 79), while pair 1, where the order makes no 
difference, is entered as (71, 71) twice. With this method the X and Y 
samples both have the same mean and the same variance. If (X, X') 
denote the observations for a typical pair, you may verity that the cor- 
relation coefficient becomes 

r,' = 2X(X - X)(X' - J)/{I(T - X? + 2(A" - X ) 2 } 

where the sums are over the a pairs and X is the mean of all observations. 
For the finger ridges, r,' = 0.962. 

With pairs (« = 2), intraclass correlations may be averaged and may 
have confidence limits set by using the transformation from r to r in 
section 7.7. The only changes are: (i) the variance of z, is \j{a - 3/2), 
where a is the number of pairs, as against l/(a - 3) with an ordinary r, 
(ii) the correction for the bias in z, is to add 1/(2 a - 1). 

With triplets (n = 3), each trio X, X', X" specifies six points: (X, X'), 
{X', X), (X, X"), (X", X), (X 1 , X"), ( X ", X'). The number of points rises 
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rapidly as n rises, and this method of calculating r/ becomes discouraging. 
In 1913, however, Harris (26) discovered a shortened process similar to 
the analysis of variance, by showing in effect that 

, (a - l)s b 2 - asj 
T 1 ~ (a - l)Sj, 2 + a(n - l)s w 2 

Comparison with equation 10.20.1 shows that r/ differs slightly from r /5 
the difference being trivial unless a (the number of classes) is small. Since 
it is slightly simpler, equation (10.20.1) is more commonly used now as 
the sample estimate of ft. 

10.21 — Tests of homogeneity of variance. From time to time we have 
raised the question as to whether two or more mean squares differ signifi- 
cantly. For two mean squares an answer, using the two-tailed F’-test, 
was given in section 4. 1 5. With more than two independent estimates of 
variance, Bartlett (27) provided a test. 

If there are a estimates s 2 , each with the same number of degrees of 
freedom /, the test criterion is 

M = 2.3026/(a log s 2 - X log s, 2 ) (s 2 = Xft» 

The factor 2.3026 is a constant (log e 10). On the null hypothesis that each 
s 2 is an estimate of the same a 2 , the quantity M/C is distributed approxi- 
mately as x 1 wit|h {a — 1 ) d.f., where 


C=1 + 


a + 1 

3af 


Since C is always slightly greater than 1, it need be used only if M lies 
close to one of the critical values of y 2 . 

In table 10.21.1 this test is applied to the variances of grams of fat 
absorbed in the four types of fat in the doughnut example of table 10.2.1. 
Here a = 4 and /= 5. The value of M is 1.88, clearly not significant 
with 3 d.f. To illustrate the method, y 2 = M/C = 1.74 has also been 
computed. 

When the degrees of freedom differ, as with samples of unequal 
sizes, the computation of y 2 is more tedious though it follows the same 
pattern. The formulas are: 

(2 3026)[(Z /) log s 2 - Hf log s, 2 ] (s 2 = Xfs.VU) 

1 


M 


C = 1 + 


3 (a - 1) 


Z- - — 

f U. 


y 2 = M/C with (a — 1) degrees of freedom 

In table 1 0.2 1 .2 this test is applied to the variances of the birth weights 
of five litters of pigs. Since s 2 is the pooled variance (weighting by degrees 
of freedom), we need a column of the sums of squares. A column of the 
reciprocals 1//, of the degrees of freedom is also useful in finding C. . The 



TABLE 10.21.1 

Computation of Bartlett’s Test of Homogeneity of Variance 
All Estimates Having /= 5 Degrees of Freedom 
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Fat 

*i 2 

log V 

1 

178 

2.2504 

2 

60 

1.778! 

3 

98 

1.9912 

4 

68 

1.8325 

Total 

404 

7.8522 


s 2 = 100.9 

log s 2 = 2.0038 


M = (2.3026X5) [4(2.0038) - 7.8522] = 

1.88, (df. * 3) 

C = 1 + = 1 + 5 = 1.083 

3a/ (3)(4)(5) 


X 2 = 1.88/1.083 = 1.74 (d.f. = 3), P > 0.5 


computations give % 2 — 16.99 with 4 d.f., showing that the intralitter 
variances differ from litter to litter in these data. 

When some or all of the s 2 are less than 1, as in these data, it is 
worth noting that x 2 is unchanged if all s 2 and s 2 are multiplied by the 
the same number (say 10 or 100). This enables you to avoid logs that are 
negative 


TABLE 10.21.2 

Computation of Bartlett’s Test of Homogeneity of Variance. 
Samples Differing in Size 


Litter 

(Sample) 

Sum of 
Squares 
/A 2 

Degrees of 
Freedom 
fi 

Mean 

Squares 

s* 

log S{ 1 

fi log S 2 

Reciprocals 

i//, 

1 

8.18 

9 

0.909 

— 0.0414 

- 0.3726 

0.1111 

2 

3.48 

1 

0 497 

— 0.3036 

- 2.1252 

0.1429 

3 

0.68 

9 

0.076 

-1.1192 

-10.0728 

0.1111 

4 

0 72 

7 

0.103 

-0.9872 

- 6.9104 

0 1429 

5 

0.73 

5 

0.146 

. ! 

-0.8357 

- 4.1785 

0.2000 

a== 5 

13.79 

37 


-23.6595 

0.7080 


s 2 = l.fsfl.f = 13.79/37 = 0.3727 
(EfJ log s’ = (37)( — 0.4286) = - 15.8582 

M = (2 3026) [(X/) log s 2 - X/j log v, 2 ] 

= (2.3026)1-15.8582 - (-23 6595)] = 17 96 


C = 1 + - 


(3)(4) 


0.7080 - 


37 


1 057 


X 2 = M/C = 17 96/1.057 = 16 99. ( df = 4) P < 0.01 
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The x 2 approximation becomes less satisfactory if most of the f x are 
less than 5. Special tables for this case are given in (28). This reference 
also gives a table of the significance levels of s maLX 2 /s min 2 , the ratio of the 
largest to the smallest of the a variances. This ratio provides a quick test 
of homogeneity of variance which, though less sensitive than Bartlett’s 
test, will often settle the issue. 

Unfortunately, both Bartlett’s test and this test are sensitive to non- 
normality in the data, particularly to kurtosis (29). With long-tailed 
distributions (positive kurtosis) the test gives too many erroneous verdicts 
ofheterogeneity. 
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★ CHAPTER ELEVEN 


T 

Awo-way classifications 


11.1 — Introduction. The experimenter often acquires the ability to 
predict roughly the behavior of his experimental material. He knows 
that in identical environments young male rats gain weight faster than 
young female rats. In a machine which subjects five different pieces of 
cloth to simulated wearing, he learns from experience that the cloths placed 
in positions 4 and 5 will receive less abrasion than those in the other posi- 
tions. Such knowledge can be used to increase the accuracy of an experi- 
ment. If there are a treatments to be compared, he first arranges the 
experimental units in groups of a , often called replications . The rule is 
that units assigned to the same replication should be as similar in re- 
sponsiveness as possible. Each treatment is then allocated by randomiza- 
tion to one unit in each replication. This produces a two-way classifica- 
tion, since any observation is classified by the treatment which it received 
and the replication to which it belonged. 

Two-way classifications are frequent in surveys also. We already en- 
countered an example (section 9. 1 3) in which farms were classified by soil 
type and owner-tenant status. In a survey of family expenditures on 
food, classification of the results by size of family and income level is 
obviously relevant. 

We first present an example to familiarize you with the standard 
computations needed to perform the analysis of variance and make any 
desired comparisons. Later, the mathematical assumptions will be 
discussed. 

11.2 — An experiment with two criteria of classification. In agricul- 
tural experiments the agronomist tries to classify the plots into replications 
in such a way that soil fertility and growing conditions are as uniform as 
possible within any replication. In this process he utilizes any knowledge 
that he has about fertility gradients, drainage, liability to attack by pests, 
etc. One guiding principle is that, in general, plots that are close together 
tend to give similar yields. Replications are therefore usually compact 
areas of land. Within each replication one plot is assigned to each treat- 
ment at random. This experimental plan is called randomized blocks , the 
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replication being a block of land. The two criteria of classification are 
treatments and replications. 

Table 11.2.1 comes from an experiment (1) in which four seed treat- 
ments were compared with no treatment (Check) on soybean seeds. The 
data are the number of plants which failed to emerge out of 100 planted in 
each plot. 


TABLE 11.2.1 

Analysis of Variance of a 2- Way Classification 
(Number of failures out of 100 planted soybean seeds) 


Treatment 

1 

Replication 

2 3 4 

5 

Total 

Mean 

Check 

8 

10 

12 

13 

11 

54 

10.8 

Arasan 

2 

6 

7 

11 

5 

31 

6.2 

Spergon 

4 

10 

9 

8 

10 

41 

8.2 

Semesan, Jr. 

3 

5 

9 

10 

6 

33 

6.6 

Fermate 

9 

7 

5 

5 

3 

29 

5.8 

Total 

26 

38 

42 

47 

35 

188 


Correction. 

Total S S,: 

Treatments S.S. : 
Replications S.S. : 


C » (188) 2 /25 = 1,413.76 

8 2 + 2 2 -f ... + 6 2 4- 3 2 — C = 220.24 

~ 83.84 


26 2 + 38 2 + , . , + 35 2 


- C = 49.84 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Replications 

4 

49.84 

12.46 

Treatments 

4 

83.84 

20 96 

Residuals (Error) 

16 

86.56 

5.41 


Total 24 220.24 


• 

The first steps are to find the treatment totals, the replication totals, 
the grand total, and the usual correction for the mean. The total sum of 
squares and the sum of squares for Treatments are computed just as m a 
one-way classification. The new feature is that the sum of squares for 
Replications is also calculated The rule for finding this sum of squares 
is the same as for Treatments The sum of squares of the replication totals 
is divided by the number of observations m each replication (5) and the 
correction factor is subtracted. Finally, in the analysis of variance, we 
compute the line 

Residuals — Total — Replications — Treatments 
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As will be shown later, the Residuals mean square, 5.41, with 16 d.£, 
is an unbiased estimate of the error variance per observation. 

The F ratio for treatments is 20.96/5.41 = 3.87, with 4 and 16 d.f., 
significant at the 5% level. Actually, since this experiment has certain 
designed comparisons, discussed in the next section, 11 . 3 , the overall 
F-test is not of great importance. Note that the Replications mean 
square is more than twice the Residuals mean square. This is an indica- 
tion of real differences between replication means, suggesting that the 
classification into replications was successful in improving accuracy. A 
method of estimating the amount of gain in accuracy will be presented in 
section 11.7. 


EXAMPLE 11.2.1 — In three species of citrus trees the ratio of leaf area to dry weight 
was determined for three conditions of shading (2). 


Shading 

Shamouti Orange 

Marsh Grapefruit 

Clementine Mandarin 

Sun 

112 

90 

123 

Half shade 

86 

73 

89 

Shade 

80 

62 

81 


Compute the analysis of variance, Ans. Mean squares for shading and error, 942.1 and 
21 .8. F- 43.2, with 2 and 4 df. The shading was effective m decreasing the relative leaf 
area. See example 1 1 .5.4 for further discussion. 


EXAMPLE 1 1 .2.2— When there are only two treatments, the data reduce to two paired 
samples, previously analyzed by the /-test m chapter 4. This f-test is equivalent to the F-test 
of treatments as given m this section. Verify this result by performing the analysis of 
variance of the mosaic virus example in section 4.3, p 95, as follows . 



Degrees of Freedom 

Sum of Squares 

Mean Square 

Replications (Pairs) 

7 

575 

82.2 

Treatments 

1 

64 

64.0 

Error 

7 

65 

9.29 


F ~ 6.89, d.f. = 1,7. y/F- 2.63 = t as given on p 94 


113 — Comparisons among means. The discussion of different types 
of comparisons in sections 10.7 and 1Q.8 applies also to two-way classifica- 
tions. To illustrate a planned comparison, we compare the mean number 
of failures for the Check with the corresponding average for the four 
Chemicals. From table 11.2.1 the means are: 

Check Arasan Spergon Semesan, Jr P Fermate 

10.8 6.2 8 2 6.6 5 8 


The comparison is, therefore, 


10.8 - 


6.2 + 8.2 + 6.6 + 5.8 


10.8 - 6.7 = 4.1 


4 
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The experiment has five replications, with s - ^5.41 = 2326 (16 d.f.). 
Hence, by Rule 10.7.1, the Estimated error of the. above difference is 

, r 1 1 7 7 __ (2326) /5 

f5\J + 4 2 + 4 2 + 4 2 + 4 2 f5 \J 4 

= 2326/2 = 1.163 

with 16 d.f \ Thus 95% confidence limits for the average reduction in 
failure rate due to the Chemicals are 

4.1 ± (2.120)0.163) = 4.1 ± 2.5, Le, 1.6 and 6.6 

The next step is to compare the means for the four Chemicals. For 
this, the discussion in section 10.8 is relevant. The LSD is 

*o,o sSyJWt = (2.120X2326^/3/5 = 3.12. 

Since the largest difference between any two means is 8.2 - 5.8 = 2.4, 
there are no significant differences among the Chemicals. Y ou may verify 
that the Studentized Range Q-test requires a difference of 4.21 for sig- 
nificance at the 5% level, giving, of course, the same verdict as the LSD 
test. 


11.4 — Algebraic notation. For the results of a two-way classification 
table 11.4.1 gives an algebraic notation that has become standard in 
mathematical statistics. Xy represents the measurement obtained for the 
unit that is in the zth row (treatment) andyth column (replication). Row 
totals and means are denoted by X { . and X,., respectively, while X. } and 
X.j denote column totals and means. The overall mean is X . . . General 
instructions for computing the analysis of variance appear under the 


TABLE 11.4. 1 

Algebraic Representation of a 2-Way Table With a Treatments and b Replications 
(Computing instructions and analysis of variance) 


Treatments 
/ = 1 . . . a 

Replications,; = 1 ... h 

1 ... j ... b 

Sum Mean 

1 

x u ... X xj ... X xh 

X t . x,. 

2 

X 2 l **■ %2j ••• %2 b 

*i- x t . 

' / 

■■■ X» ■■■ X ib 

X t . Xi. 

a 

A'a! ... X a j . . . X ab 

x a . x a . 

Sum 

X-, ... X., ... x. b 

X.. 

Mean 

■V., ... X,. ... x. b 

x„ 
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TABLE 11.4.1 (Continued) 

Correction: C - (ZX^/ab = X.. 2 jab 
Total: ZXf -C 

Xi* + ... 4- X a . 2 

Treatments: A = C 

b 

X, 2 + . . . + X. 2 

Replications: B = — — — - C 


Residuals: D = Total - (Treatments + Replications) 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Treatments 

a — 1 

A 

Aj{a - 1) 

Replications 

6-1 

B 

BKb - 1) 

Residuals 

(a — 1)(6 — 1) 

D 

D/(a - 1)(6 - 1) 

Total 

ab - 1 

A + B 4- D 



table. Note that the number of df for Residuals (Error) is (a - 1 )(b- 1 ), 
the product of the numbers of d.f. for rows and columns. 

In this book we have kept algebraic symbolism to a minimum, in 
order to concentrate attention on the data. The symbols are useful, how- 
ever, in studying the structure of the two-way classification in the next 
section. 

11.5 — Mathematical model for & two-way classification. The model 
being used is 

Xij = n + 0 Ci + fa 4- e ip i = 1 . . . a, j = 1 . . . 6, 

where \i represents the overall mean, the stand for fixed row (treatment) 
effects and the p j for fixed column (replication) effects. The convention 

Efy — Epj = 0 

is usually adopted. 

This model involves two basic assumptions : 

1 . The mathematical form (// + «,• + fij) implies that row and column 
effects are additive. Apart from experimental errors, the difference in 
effect between treatment 2 and treatment 1 in replication j is 

0* + a 2 + Pj) - (fi + «i + pj ) = a 2 - a! 

This difference is the same in all replications. When we analyze real data, 
the*e is no assurance that row and column effects are exactly additive. 
The additive model is used because of its simplicity and because it is 
often a good approximation to more complex types of relationships. 

2. The &U are independent random variables, normally distributed 
with mean 0 and variance a 2 . They represent the extent to which the 
data depart from the additive model because of experimental errors. 
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As an aid to understanding the model we shall construct a set of data 
by its use. Let 

H =30 

<x x — 10, a 2 = 3, a 3 = 0, a 4 = — 13; Xa, = 0 
j3,= 1, jS 2 = -4, /? 3 = 3; X£, = 0 

The e y are drawn at random from table 3.2.1, each decreased by 30. This 
makes the s tJ approximately normal with mean 0 and variance 25. 

In each cell of table 11.5.1, H = 30 is entered first. Next is the treat- 
ment a h differing from row to row. Following this is the replication 
effect, one in each column. In each cell, the sum of these three parts is 

TABLE 11.5.1 

Experiment Constructed According to Model I. /i = 30 


Treatment 

h - 1 

Replication 
fl 2 = — 4 

03 = 3 

*«• 



a t * 10 

30 

30 

30 





10 

10 

10 





1 

- 4 

3 





-11 

- 7 

3 





* u = 30 

* 12 = 29 

*33 = 46 

105 


35 

a 2 =3 

30 

30 

30 





3 

3 

3 





1 

- 4 

3 





1 

5 

- 3 





* 21 =35 

*22 = 34 

*23 * 33 

102 


34 

o 

li 

30 

30 

30 





0 

0 

0 





1 

- 4 

3 





0 

4 

- 1 





*3, - 31 

*32 - 30 

*33 = 32 

93 


31 

a 4 = —13 

30 

30 

30 





-13 

-13 

-13 





1 

- 4 

3 





- 2 

- 2 

1 





*4. = 16 

*42 = H 

*43 = 21 

48 


16 

*, 

112 

104 

132 

348 


28 

26 

33 

29 

* 

Source of Variation 

Degrees of Freedom Sum of Squares 

Mean Square 

Replications 

2 


104 


52 


Treatments 

3 


702 


234 


Residuals 

6 


132 


22 
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fixed by jx, the a„ and the p r Sampling variation is introduced by the 
fourth entry, a deviation drawn at random from table 3.2.1. According 
to the model, is the sum of the four entries just described. 

Some features of the model are now apparent : 

( 1 ) The effects of the treatments are not influenced by the pj because 
the sum of the pj in each row is zero. If there were no errors, check from 
table 11.5.1 that the sum for treatment 1 would be 41 -f 36 + 43 = 120, 
the mean being 40 = \i + a x . The observed mean, X x = 35, differs from 
40 by the mean of the e ir namely (~ 11 - 7 -f 3)/3 = -5. This is an 
instance of the general result 

X t . = /x + a f + (Sq + e l2 + . . . + &ib)/b 

This result shows that X t . is an unbiased estimate of [i + a* and that its 
variance is cr 2 /A, because the error of the estimate is the mean of b inde- 
pendent errors, each with variance a 2 . 

(li) In the same way, the replication means are unbiased estimates of 
\i + pj, with variance c t 2 /a. 

(ni) In the analysis of variance the Residuals mean square, 22, is an 
unbiased estimate of a 2 = 25. More explanation on this point will be 
given presently. 

(iv) The mean square for Replications is inflated by the pj and that 
for Treatments by the a r The expected values of these mean squares 
are shown in table 11.5.2, which deserves careful study. Note that the 
expected value of the Treatments mean square is the same as in a one-way 
classification with b observations in each class (compare with equation 
10.4.1, p. 265). 


TABLE 115 2 

Component Analysis of the Constructed Experiment 


Source ot Variation 

Degrees of Freedom 

Mean Square 

Expected Value 
(Parameters Estimated) 

Replications 

2 

52 

a 1 + ok b 2 

Treatments 

3 

234 

a 1 + bK A 2 

Residuals 

6 

22 

a 2 


k b 


£/*/ (l) 2 + (— 4) 2 + (3) 2 


13 


(10) 2 + (3) 2 + (0) 2 + (-J3) 2 


= 92§ 


" a - 1 3 

s B 2 = (52 - 22)1 A = 8 estimates 13 s A 2 - (234 - 22)/3 = 71 estimates 


Error Mean Square = 22 estimates 25 

Replications Mean Square = 52 estimates 25 -f 4(13) = 77 

Treatments Mean Square = 234 estimates 25 + 3(93) = 304 
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We turn to the estimates of n, a, and f} y These estimates are 

£ = J..; a, = x v - X..; = J.j - X.. 

If we estimate any individual observation X t} from the fitted models 
the estimate is 

#.,= £ + &,+ h = x.. + (X,. - X..) + (X.J - X..) 

= X,. + X. : - X.. 

Table 11.5.3 shows the original observations X t] , the estimates X tJ , 
and the deviations of the observations from the estimates, D tJ = X tJ — X tJ . 
For treatment 1 in replication 2, for instance, we have from table 11.5.1, 

X 12 = 29, X 12 = 35 + 26 - 29 = 32, D l2 = - 3 


TABLE 11 5 3 

Linear Model Fitted to the Observations in Table 115 1 


Treatment 

Replication 

1 2 3 

1 £- j 

Aij 

30 29 46 

34 32 39 

Dfj 

- 4 -3 +7 

2 X 

Aij 

35 34 33 

33 31 38 


+ 2 + 3 — 5 

3 X tJ 

31 30 32 

30 28 35 

D %j 

+ 1 +2 - 3 

4 i 1 

A tJ 

16 11 21 

15 13 20 


+ 1 - 2 +1 


The deviations D tJ have three important properties 

(i) Their sum is zero m any row or column. 

( 11 ) Their sum of squares, 

( — 4) 2 + ( + 2) 2 + + ( — 3) 2 + (+1) 2 = 132, 

is equal to the Residuals sums of squares m the analysis of variance at the 
foot of table 115 1 Thus the Residuals sum of squares measures the 
extent to which the linear additive model fails to fit the data This result 
is a consequence of a general algebraid identity 
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Residuals S S = £ £ (X M - X, - X , + J. ) 2 

= I i (*„ - *-) 2 - £>L ( j , - *..) 2 - a £ (X., - X ..) 2 

l J l J 

Total S.S. - Treatments S.S. - Replications S.S. 

This equation shows that the analysis of variance is a quick method 
of finding the sum of squares of the deviations of the observations from 
the fitted model. When the analysis is programmed for an electronic 
computer, it is customary to compute and print the D iy This serves two 
purposes. It enables the investigator to glance over the D tJ for signs of 
gross errors or systematic departures from the linear model, and it pro- 
vides a check on the Residuals sum of squares. 

(in) From the constructed model you may verify the remarkable re- 
sult that 

A j = e y - e,. - £. j + £.. 

For example, for treatment 1 in replication 2 you will find from table 11.5.1, 

£j2 = -7; 8j. = -5; e. 2 = 0; e.. = - 1 

e i2 - £i- - £-2 + £•• = (-7) - (-5) - (0) + (-1) = -3, 

in agreement with D ll — — 3 in table 11.5.3. Thus, if the additive model 
holds, each D tJ is a linear combination of the random errors. It may be 
shown that any is an unbiased estimate of (a — 1)(6 - l)a 2 fab. It 
follows that the Residuals sum of squares is an unbiased estimate of 
( a - 1)(Z> — l)cr 2 . This gives the basic result that the Residuals mean 
square, with ( a — l)(h — 1) d.f., is an unbiased estimate of a 2 . 

To summarize the salient features, the additive model implies that the 
treatment effects «, are the same in every replication, and vice versa. 
If additivity holds (apart from independent errors) the observed treatment 
means are unbiased estimates of the treatment effects. The F-test may be 
applied both to Treatments and Replications. The Residuals mean square 
measures the extent to which the additive model fails to fit the data and 
provides an unbiased estimate of a 2 . 

EXAMPLE 115.1 —Suppose that with a~b~ 2, treatment and replication effects are 
multiplicative Treatment 2 gives results 20% higher than treatment 1 and replication 2 
gives results 1 0% higher than replication 1 . With no random errors, the observations would 
be as shown on the left below. 


Treatment 

Replication 

1 2 

Treatment 

Replication 

1 2 

1 

100 

1.10 

I 

0 995 

1.105 

2 

1 20 

1 

1.32 

2 

1.205 

1.315 


Verify that the X tJ given by fitting the linear model are as shown on the right above 
Any D t j is only ±0 005. The linear model gives a good fit to a multiplicative model when 
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treatment and replication effects are small or moderate. If, however, treatment 2 gives a 
100% increase and replication 2 a 50% increase, you will find D tj — ± 0. 125, not so good a fit. 

EXAMPLE 11.5.2— In table 11.5.3, verify that X 33 = 35 Z> 33 = -3. 

EXAMPLE 1 1 .5.3 — Perform an analysis of variance of the X tj in table 11.5.3. Verify 
that the Treatments and Replications sums of squares are the same as for the X ip but that 
the Residual sum of squares is zero. Can you explain these results? 

EXAMPLE 11.5.4 — Calculate the D u for the 3 x 3 citrus data in example 1 1.2.1 and 
verify that the Residuals mean square, computed from the D ip is 21.8. Carry one decimal 
place in the Dy. 

EXAMPLE 1 1 .5.5— The result, 

s.j + L., 


shows that Dy is a linear combination of the form 'L'LX ij z iJ . By Rule 10.7.1, its variance is 
For j D n , for example, the X tj work out as follows: 


Observations 

No. of Terms 




i 

1 

1 


Rest of D i} 

(6-1) 

-(a - 1) lab 


Rest of D n 

(fl-D 

~(b- 1 )/ab 


Rest of D u 

(a- 1)0-1) 

+ 1 fab 



It follows that - {a - 1 )(b - 1 )/ab. Thus D n 2 and similarly any Dy 2 estimates 

(a - 1)(A — 1 )<r 2 jab, as stated in the text. 


11.6 — Partitioning the treatments sum of squares. When the treat- 
ments contain certain planned comparisons, it is often possible to parti- 
tion the Treatments sum of squares in the analysis of variance in a way 
that is helpful. Some rules for doing this will now be given. In the 
analysis of variance, comparisons are usually calculated from the treat- 
ment totals T t rather than the means, since this saves time and avoids 
rounding errors. 

Rule 11.6.1 — If L = X x T x + ... + X a T a , (EX t - = 0) is a comparison 
among the treatment totals, then 

L 2 /nEX 2 

is a part of the sum of squares for treatments, associated with a single de- 
gree of freedom, where n is the number of observations in any treatment 
total. 

In the experiment on seed treatment of soybeans (table 11.2.1) the 
comparison Check vs. Chemicals may be represented as follows: 

Check Arasan Spergon Semesan, Jr. Fermate 

Total (Ti) 54 31 41 33 29 

M 4 - 1 - 1 - 1 - 1 


To avoid fractions the X t have been taken as 4, — 1, — 1, — 1, — 1 instead of 
as 1, —1/4, -1/4, - 1/4, —1/4 in section 11.3. This gives 
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L = 4(54) - 31 - 41 - 33 - 29 * 82 

Since n = 5, the contribution to the Treatments sum of squares is 

L 2 jnLX 2 = (82) 2 /(5)(20) = 67.24 (1 d.f.) 

The Treatments sum of squares was 83.84 with 4 d./. The remaining part 
is therefore 16.60 with 3 d.f. What does it represent? As might be guessed, 
it represents the sum of squares of Deviations of the totals for the four 
Chemicals from their mean, namely, 

31 2 -f 41 2 4- 33 2 -f 29 2 134 2 


Thus, the original analysis of variance in table 1 1.2.1 might be reported 
as follows: 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Check vs. Chemicals 

i 

67.24 

67.24 

Among Chemicals 

3 

16.60 

5.53 

Residuals (Error) 

16 

86.56 

5.41 


The F ratio 67.24/5.41 = 12.43 (P < 0.01) shows that the average failure 
rates are different for Check and Chemicals (though as usual it does not 
tell us the size and direction of the effect in terms of means). The F ratio 
5.53/5.41 = 1.02 for Among Chemicals warns us that there are unlikely 
to be any significant differences among Chemicals, as was already verified. 

As a second example, consider the data on the effect of shade on the 
ratio of leaf area to leaf weight in citrus trees (example 11.2.1). The 
“treatment” totals, n = 3, were as follows : 


Totals 

f, 

Sun 

325 

Half 

Shade 

248 

Shade 

223 

Comparison 

Divisor 

l 2 m£x 2 

Effect of shade 

t 

+ 1 

0 

-1 

102 

6 

1734 

Half shade vs. Rest A 2i 

+ 1 

— 2 

+ 1 

52 

18 

150 


We might measure the effect of shade by the extreme comparison 
Lj = (Sun — Shade). We might also be interested in whether the results 
for Half Shade are the simple average of those for Sun and Shade. This 
gives the comparison L 2 . 

Rule 11.6.2 — Two comparisons: 

= X lx T x + X 12 T 2 + . . . 4- X l0 T a = 'LX u T h 
L 2 — k 2X T x + X 22 T 2 + ...+■ X 2 a T a = ^X zi T h 

are orthogonal if 

Aj x a 2 i + a 1 2^2 2 + . . . 4- X Xa X 2a — 0 : i.e. ^LahA 2 t * = 0 
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In applying this rule, if a total T t does not enter into a comparison, its 
coefficient is taken as zero. 

The comparisons L x and L 2 are orthogonal, since 

(+l)( + l) + (0)(-2) + (— 1)(+1) = 0 

P.ule 11.6.3 — If two comparisons are orthogonal, their contributions 
LfjnLX 2 and L 2 2 /n'ZX 2 2 are independent parts of the sum of squares 
for treatments, each with 1 d.f. 

This means that the Treatments S.S. may be partitioned into the 
contributions due to L 1 and L 2 , plus any remainder (with (a — 3) d.f). 
A consequence of this rule is 

Rule 11.6.4 — Among a treatments, if ( a — 1) comparisons are mu- 
tually orthogonal (i.e., every pair is orthogonal), then 


L 2 L 2 

riLX u 2 + nXU 2l 2 + + 


— T = Treatments S.S. 


The citrus data, with a — 3, are an example. The sum of the squared con- 
tributions for L y and L 2 is 1 7 : 34 + 1 50 = 1 884, which may be verified to be 
the Treatments S.S. Thus, the relevant part of the analysis of variance 
can be presented as follows: 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

F 

Effect of shade 

1 

1734 

1734 

79.5 

Half shade vs. Rest 

1 

150 

150 

6.9 

Error 

4 

87 

21.8 



The F value for the effect of shade is highly significant. With 1 and 4 d.f., 
F = 6.9 for the comparison of half shade with the average of sun and 
shade does not quite reach the 5% level. There is a suggestion, however, 
that the results for half shade are closer to those for shade than to those 
for sun. Both these comparisons can, of course, be examined by r-tests 
on the treatment means. 

EXAMPLE 11 6 1 — In the following artificial example, two of the treatments were 
variants of one type of process, while the other four were variants of a second type. The 
treatment totals (4 replications) weie. 

Process 1 Process 2 

59 68 70 84 76 81 


Partition the Treatments S.S. as follows 


Source of Variation 

* Degrees of Freedom 

Sum of Squares 

Mean Square 

Between processes 

i 

1 

67.69 

67.69 

Variants of process 1 

1 

1 

10.12 

10.12 

Variants of process 2 

1 

3 

28.19 

9.40 




m 


1 1*7 — Efficiency of blocking. When an experiment has been set out m 
replications, using the randomized blocks design, it is sometimes of in- 
terest to know how effective the blocking was in increasing the precision 
of the comparisons, particularly if there is doubt whether the criterion 
used in constructing the replications is a good one, or if the use of these 
replications is troublesome. From the analysis of variance of a random- 
ized blocks experiment, we can estimate the error variance that would 
have been obtained if a completely random arrangement of the same ex- 
perimental units (plots) had been used instead of randomized blocks. 

Call the two error variances s CR 2 and s RB 2 . With randomized blocks 
the variance of a treatment mean is s RB 2 /b . To get the same variance of a 
treatment mean with complete randomization, the number of replications 
77 must satisfy the relation 

s cr 2 ___ s rb 2 or w _ s cr 2 
n b b s RB 2 

For this reason the ratio s CR 2 /s RB 1 is used to measure the relative efficiency 
of the blocking. 

If M b and M e are the mean squares for blocks and error in the analysis 
of variance of randomized blocks experiment that has been performed, it 
has been shown (3, 4) that 

s C a 2 = (b - l)M B + b(a - 1 )M h 
s RB 2 (ab - 1 We 

Using the soybeans experiment as an example (table 11.2.1), 
M B = 12.46, M e = 5.41, a =-6 = 5, 

S «L = 4(12.46) + 20(5.41) = 
s RB 2 24(5.41) 


With complete randomization, about six replications instead of five would 
have been necessary to obtain the same standard error of a treatment 
mean. 

This comparison is not quite fair to complete randomization, which 
would provide 20 d.f. for error as against 16 with randomized blocks and 
therefore require smaller values of t in calculating confidence intervals. 
This is taken into account by a formula suggested by Fisher (5), which 
replaces the ratio s CR 2 /s RB 2 by the following ratio: 


Relative amount of information 


( / rb U(/ck 4- 3) *>cr 2 

(/ RB + $)(fcR 1) S RB 2 


(16 + 1)(20 + 3) 
(16 -f 3) (20 4- 1) 


( 1 . 22 ) = 1.20 


The adjustment for d f has little effect here but makes more difference in 
small experiments 
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EXAMPLE 11. 7.1 — In a randomized-blocks experiment which compared four strains 
of Gallipoli wheat (6) the mean yields (pounds per plot) and the analysis of variance were as 
follows: 


Strain 

A B 

C 

D 

Mean yield 

34.4 34.8 

33.7 

28.4 

Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Blocks 

4 

21.46 

5.36 

Strains 

3 

134.45 

44.82 

Error 

12 

26.26 

2.19 


(i) How many replications were there? (ii) Estimate s CR 2 /s RB 2 . (ill) Estimate the relative 
amount of information by Fisher’s formula. Ans. (ii) 1.30, (ni) 1 .26 


EXAMPLE 11.7.2 — In example 11.7.1, verify that the LSD and the Q methods both 
show D inferior to the other strains, but reveal no differences among the other strains. 

11.8 — Latin squares. In agricultural field experiments, there is fre- 
quently a gradient in fertility running parallel to one of the sides of the 
field. Sometimes, gradients run parallel to both sides and sometimes, in a 
new field, it is not known in which direction the predominant gradient 
may run. A useful plan for such situations is the Latin square. With four 


treatments. A, B , C, D, it may be like this : 


A 

B C 

D 

C 

A D 

B 

D 

C B 

A 

B 

D A 

C 


The rows and columns of the square are parallel to the two sides of the 
field. Each treatment appears once in every row and once in every column, 
this being the basic property of a Latin square. Differences in fertility 
between rows and differences between columns are both eliminated from 
the comparison of the treatment means, with a resultant increase in the 
precision of the experiment. 

In numerous other situations the Latin square is also effective in 
controlling two sources of variation of which the investigator has predic- 
tive knowledge. In psychology and medicine, the human subject fre- 
quently comprises a replication of the experiment, receiving all the treat- 
ments in succession, with intervening intervals in which the effects of pre- 
vious treatment will have died away. However, a systematic effect of the 
order in which the treatments are given can often be detected. This is 
controlled by making the columns of the square represent the order, 
while rows represent subjects. In animal nutrition, the effects of both 
litter and condition of the animal may be removed from the estimates of 
treatment means by the use of a Latin square. 

To construct a Latin square, write down a systematic arrangement 





m 

of the letters and rearrange rows and columns at random. Then assign 
treatments at random to the letters. For refinements, see (7). 

The model for a Latin square experiment (model I) is 

X lJk = fi + 0 Li + pj + 7ft + 6 OJk ; Uj and a; s iJk = ^(0, a) 

where a, /?, and y indicate treatment, row, and column effects, with the 
usual convention that their sums are zero. The assumption of additivity 
is carried a step further than with a two-way classification, since we assume 
the effects of all three factors to be additive. 

It follows from the model that a treatment mean is an unbiased 
estimate of \i + a f , the effects of rows and columns canceling out because 
of the symmetry of the design. The standard error of J,.. is a/ Ja. The 
estimate X lJk of the observation X ijk made from the fitted linear model is 

X ijk = X... + (J?*.. — X...) + (X.j. — X...) 4- ~~ X ...) 

Hence, the deviation from the fitted model is 

A jk = X ijk ““ %ijk 5=5 %i}k ~ ~ ^**Jk + 23f ... 

As in the two-way classification, the error sum of squares in the 
analysis of variance is the sum of the JD i}k 1 and the Error mean square is 
an unbiased estimate of <r 2 . 

Table 1 1.8,1 shows the field layout and yields of a 5 x 5 Latin square 
experiment on the effects of spacing on yields of millet plants (8). In the 
computations, the sums for rows and columns are supplemented by sums 

TABLE 11.8.1 

Yields (Grams) of Plots of Millet Arranged in a Latin Square 
(Spacmgs ; A, 2-mch; B , 4; C, 6; Z>, 8; £, 10) 


Row 

Column 

Sum 

1 2 3 4 5 

n 

1 

3 

4 

5 

B: 257 E: 230 A: 219 C* 287 D: 202 f 

D: 245 A : 283 E: 245 B: 280 C: 260 j 

E: 182 B: 252 C: 280 D: 246 ,4:250 

^ : 203 C: 204 D : 227 £: 193 B: 259 

C: 231 D: 271 R: 266 ,4:334 £: 338 

1,255 

1,313 

U10 

1,086 

1,440 

11 1 

Sum 

1,118 1,240 1,297 1,340 1,309 

6,304 

Summary by Spacing 


4 

A 2" B 4" C. 6" £>• 8" £. 10” 

Sum 

1,349 1,314 1,262 1,191 1,188 

6,304 

Mean 

269.8 262.8 252.4 238 2 237.6 

252.2 


(Continued next page) 
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TABLE 118 1 ( Continued) 

Correction (6,304)725 = 1,589,617 

Total (257 f + + (338) 2 - 1,589,617 = 36,571 

Rows - + _ 1,589,617 = 13,601 

(1,118) 2 + + (1,309) 2 


Columns 


1,589,617 = 6,146 


(‘ • M9)J - , +<■'*** 


Error 12,668 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Total 

24 

36,571 


Rows 

4 

13,601 

3,400 

Columns 

4 

6,146 

1,536 

Spacmgs 

4 

4,156 

1,039 

Error 

12 

12,668 

1,056 


and means for treatments (spacmgs). By the usual rules, sums of squares 
for Rows, Column, and Spacmgs are calculated These are subtracted 
from the Total S S to give the Error S.S. with ( a - 1 )(a — 2) = 12 df 
Table 118 2 shows the expected values of the mean squares, with the 
usual notation. For illustration we have presented the results that apply 
if the Pj and y k in rows and columns represent random effects, with fixed 
treatment effects a, 

TABLE 11 8 2 

Component Analysis in Latin Square 


Source of Variation 

Degrees of Freedom 

Mean Square 

Estimates of 

Rows, R 

a - 1 

M* 

a 2 4- ao R 2 

Columns, C 

a - 1 

Me 

a 2 + aa c 2 

Treatments, A 

a- 1 

m a 

<r 2 4- o.k a 2 

Error 

1 

3 

1 

>5 

m e 

CT 2 


This experiment is typical of many m which the treatments consist of 
a senes of levels of a variable, m this case width of spacing. The objective 
is to determine the relation between the treatment mean yields, which we 
will now denote by Y l , and width of spacing X x Inspection of the mean 
yields suggests that the relation may bejinear, the yield decreasing steadily 
as spacing increases The X n x l9 and Y t , are shown m table 118 3 


TABLE 118 3 

Data for Calculating the Regression of Yield on Spacing 


Spacing, X l 

2 

4 

6 

8 

10 

= X, - X 
y, (gms ) 

-4 

269 8 

— 2 

262 8 

0 

252 4 

2 

238 2 

4 

237 6 



3J5 


The regression coefficient of yield on spacing is 

Z(Z, - X)(7 t - 7 ) ZxX 


b = 


*(X, - xf 




178 0 
4Q~ 


-4 45 , 


the units being grams per inch increase in spacing Notice that b is a 
comparison among the treatment means, with X, = \„Ev, 2 From Rule 
10 7 1, the standard error of b is 

= V(s 2 IA 2 /a) = V(s 2 /alx 2 ) = > /{(1056)/(5)(40)} = 2 298 

With 12 df, 95% confidence limits for the population regression are 
+ 0 6 and —9 5 grams per inch increase The linear decrease in yield is 
not quite significant, since the limits include 0 

In the analysis of variance, the Treatments S S can be partitioned 
into a part representing the linear regression on width of spacing and a 
part representing the deviations of the treatment means from the linear 
regression This partition provides new information If the true regres- 
sion of the means on width of spacing is linear, the Deviations mean square 
should be an estimate of cr 2 ^ If the true regression is curved, the Devia- 
tions mean square is inflated by the failure of the fitted straight line to 
represent the curved relationship Consequently, F— Deviations M S / 
Error M S tests whether the straight line is an adequate fit. 

The sum of squares for Regression (l df) can be computed by the 
methods on regression given in chapter 6 In section 6 15 (p 162) this 
sum ot squares was presented as ( 'Lxyfj'Lx 2 (table 6 15 3) In this exam- 
ple we have already found hxy = I t x l Y l = - 178.0, and Lx 2 = 40, giving 
(Zxy) 2 /Z x 2 = (178 0) 2 /4Q = 792 1 Since, however, each Y t is the mean 
of five observations, we multiply by 5 when entering this term in the 
analysis of variance, giving 3,960 The S S for Deviations from the re- 
gression is found by subtracting 3,960 from the total S S for Spacmgs, 
4156 (table 11 8 4) 


TABLE 118 4 

Analysis of Regression of Spacing Mean on Width of Spacing 
(Millet experiment) 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

F 

Spacmgs (table 118 1) 

4 

4 156 



f Regression 

I 1 

3 960 

3 960 

3 75 

I Deviations 

2 

196 

66 

0 06 

Error (table 118 1) 

12 

12 668 

1 056 



The F-ratio for Deviations is very small, 0 06, giving no indication 
that the regression is curved The F for Regression, 3 75, is not quite 
significant, this test being the same as the Mest for b 

The results of this experiment are probably disappointing In trying 
to discover the best width of spacing, an investigator hopes to obtain a 
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curved regression, with reduced yields at the narrowest and widest spac- 
ings, so that his range of spacings straddles the optimum. As it is, assum- 
ing the linear regression real, the best spacing may lie below 2 in. Methods 
of dealing with curved regressions in the analysis of variance are given in 
chapter 12. 

Since the number of replications in the Latin square is equal to the 
number of treatments, the experimenter is ordinarily limited to eight or ten 
treatments if he uses this design. For four or less treatments, the degrees 
of freedom for error are fewer than desirable, {a — l )(a — 2) = (3)(2) = 6 
for the 4x4. This difficulty can be remedied by replicating the squares. 

The relative efficiency of a Latin square experiment as compared to 
complete randomization is 

M g + M c + (a - 1 )M e 
( a + 1 )M e 


Substituting the millet data : 


Relative Efficiency 


'CR_ 

2 


'Sr 


3400 + 1536 + (5 - 1)(1056) 
(5 + 1) (1056) 


= 145%, 


a gain of 45% over complete randomization. 

There may be some interest in knowing the relative efficiency as com- 
pared to a randomized blocks experiment in which either rows or columns 
were lacking. In the millet experiment since the column mean square was 
small (this may have been an accident of sampling), it might .have been 
omitted and the rows retained as blocks. The relative efficiency of the 
Latin square is - 


M c + (a — 1 )M e 

aM E 


1536 + (5 - 1)1056 
(5)(1056) 



Kempthome (4) reminds us that this may not be a realistic com- 
parison. For the blocks experiment the shape of the plots would presum- 
ably have been changed, improving the efficiency of that experiment. In 
this millet experiment, appropriately shaped plots in randomized blocks 
might well have compensated for the column control. 


EXAMPLE 11.8,1 — Here is a Latin square for easy computation. Treatments are 
indicated by A, B, and C. 



Columns 

Rows 

1 

2 

3 

1 

B\ 23 

A: 17 

C : 29 

n 

A: 16 

C: 25 

B: 16 

3 

C: 24 

B: 18 

A: 12 


The mean squares are: rows, 21; columns, 3; treatments, 93; remainder, 3. 
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EXAMPLE 11.8.2 — Fit the linear model for Latin squares to the data of example 
11.8.1. Verify the fitting by the relation, ZD ijk 2 ~ 6. 

EXAMPLE 11.8.3 — In experiments affecting the milk yield of dairy cows the great 
variation among individuals requires large numbers of animals for evaluating moderate dif- 
ferences. Efforts to apply several treatments successively to the same cow are complicated 
by the decreasing milk flow, by the shapes of the lactation curves, by carry-over effects, and 
by presumed correlation among the errors, s iik . The effort was made to control these diffi- 
culties by the use of several pairs of orthogonal Latin squares (9), the columns representing 
cows, the rows successive periods during lactation, the treatments being A = roughage, 
B = limited grain, C - full grain. 

For this example, a single square is presented, no effort being made to deal with carry- 
over effects. The entries are pounds of milk for a 6-week period. Compute the analysis of 
variance. 


! 

Period 

Cow 

* 


1 

2 

3 

I 


A: 608 

B: 885 

C: 940 

II 


B: 715 

C: 1087 

A: 766 

III 


C: 844 

A: 711 

B: 832 

Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Periods 


2 

5,900 

2,950 

Cows 


\ 2 

47,214 

23,607 

Treatments 


2 

103,436 

51,718 

Error 


2 

4,843 

2,422 


1 1 .9 — Missing data. Accidents often result in the loss of data. Crops 
may be destroyed, animals die, or errors made in the application of the 
treatments or in recording. Although the least squares procedure can be 
applied to the data that are present, missing items destroy the symmetry 
and simplicity of the analysis. The calculations! methods that have been 
presented cannot be used. Fortunately, the missing data can be estimated 
by least squares and entered in the vacant cells of the table. Application 
of the usual analysis of variance, with some modifications, then gives 
results that are correct enough for practical purposes. 

In these methods the missing items must not be due to failure of a 
treatment. If a treatment has killed the plants, producing zero yield, this 
should be entered as 0, not as a missing value. 

In a one-way classification (complete randomization) the effect of 
missing values is merely to reduce the sample sizes in the affected classes. 
The analysis is handled correctly by the methods for one-way classifica- 
tions with unequal numbers (section 10. 12). No substitution of the miss- 
ing data is required. 

In randomized blocks, a single missing value is estimated by the 
formula (26) 

aT+bB~S 
~(a~ l)(i>~ if 
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where 


a = number of treatments 
b = number of blocks 

T = sum of items with same treatment as missing item 
B = sum of items in same block as missing item 
S = sum of all observed items 

As an example, table 11.9.1 shows the yields in an experiment on four 
strains of Gallipoli wheat, in which we have supposed that the yield for 
strain D in block 1 is missing. We have 


T = 1 12.6, B - 96.4, S = 627.1, a = 4, b = 5. 


4(112.6) + (5) (96.4) - 627.1 
* (3)(4) 


25.4 pounds 


TABLE 11.9.1 

Yields o f Four Strains of Wheat in Five Randomized Blocks 
(Pounds Per Plot) With One Missing Value 


Strain 

1 

2 

Block 

3 

4 

5 

Total 

A 

32.3 

34.0 

34 3 

35.0 

36.5 

172.1 

B 

33.3 

33 0 

36.3 

36.8 

34.5 

173 9 

C 

30.8 

34.3 

35.3 

32 3 

35.8 

168.5 

D 


26.0 

29.8 

28.0 

28.8 

112.6 

Total 

96.4 

127.3 

135 7 

132.1 

135.6 

627.1 


Analysis of Variance (With 25.4 Inserted) 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Squares 

Blocks 

4 

35 39 


Strains 

3 

171.36 

57 12 (45.79) 

Frroi 

a- 

17.33 

1 58 

Total 

1A 

224 08 



This value is entered in the table as the yield of the missing plot. All 
sums of squares in the analysis of variance are then computed as usual. 
However, the degrees of freedom in the Total and Error S.S., are both 
reduced by 1, since there are actually only \SI d.f. for the Total S.S. and 
llfor Error. j 

This method gives the correct least squards estimates of the treatment 
means and of the Error mean square. For the comparison of treatment 
means, the s.e . of the difference between the mean with a missing value 
and another treatment mean is not ^(2 s 2 /b) but the larger quantity 



3J9 


b + b{b~ 1)(« - 1) 


rr 

[2 4 1 

1(1.58) 

[_5 + (5)(4)(3)_ 


±0.859, 


as against ±0.795 for a pair of treatments with no missing values. 

The Treatments (Strains) mean square in the analysis of variance is 
slightly inflated. The correction for this upward bias is to subtract from 
the mean square 


{B - (■ a - l)X} 2 {96.4 - (3)(25.4)} 2 

a(a - l) 2 “ (4)(3)(J) 


This gives 57.12 — 11.33 = 45.79 for the correct mean square. 

This analysis does not in any sense recover the lost information, but 
makes the best of what we have. 

For the Latin square the formulas are : 


X = [a(R + C + T) — 2S]/(a - 1 )(o - 2) 

Deduction from Treatments mean square for bias 

= [S-i?-C-(a- l)T] 2 /(a - l) 3 (a - 2) 2 

where a is the number of treatments, rows, or columns. 

To illustrate, suppose that in example 11.8.3 the milk yield, 608 
pounds, for Cow 1 in Period I was missing. Table 1 1 .9.2 gives the result- 
ing data and analysis. The correct Treatments mean square is (40, 408). 


Bias • 


3(1825 + 1559 + 1477) - 2(6780) _ 
( 2 ) 0 ) 

[6780 - 1825 - 1559 - (2)(1477)] 2 


512 pounds 


( 2 )( 2 )( 2 )( 1 )( 1 ) 

TABLE 11.9.2 

3x3 Latin Square With One Missing Value 


= 24,420 


Penod 

r - 

i 

Cow 

2 

~ -- — — i 

3 ! 

Total j 

• 

Treatments 

I i 

A ... 

B 885 

C 940 

1,825 ' 

i A 1,477 

a 

B 715 

C 1,087 

A 766 

2,568 

| B 2,432 

m 

i 

C 844 

A 711 

B 832 

. . . 1 

2,387 

C 2,871 

■ — ■ — '■ — i 

Total 

1,559 

2,683 

2,538 

6,780 

6,780 ' 


Source of Variation j Degrees of Freedom Sum of Squares Mean Squares 


Rows (Periods) j 2 9,847 

Columns (Cows) i 2 68,185 

Treatments 1 2 129,655 64,828(40,408) 

Error I 1 2,773 2,773 


Total 


7 


210,460 
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Of course, no worthwhile conclusions are likely to flow from a single 
3x3 square with a missing value, the Error M.S. having only 1 d.f. The 
s.e. of the difference between the treatment mean with the missing value 
and any other treatment mean is 


2 1 
a + (a - l)(a - 2) 

Two or more missing data require more complicated methods. But 
for a few missing values an iterative scheme may be used for estimation. 

To illustrate the iteration, the data in table 11.9.3 seem adequate. 
Start by entering a reasonable value for one of the missing data, say 
X 22 ~ 10.5. This could be A".. = 9.3, but both the block and treatment 
means are above average, so 10.5 seems better From the formula, X 3l is 



(3)(27) + (3)(21) - 75.5 
(3 - 1)(3 - 1) 


17.1 


Substituting X 2l = 17.1 in the table, try for a better estimate of X 22 by 
using the formula for X 22 missing: 


X 


22 — 


(3)(23) + (3) (20) - 82.1 
4 


11.7 


With this revised estimate of *22> re-estimate X 31 : 

v (3)(27) + (3)(21) - 76.7 iro 

*31 = ”4 = 16 ' 8 

Finally, with this new value of X 3i in the table, calculate X 22 = 11.8. One 
stops because with X 22 = 1 1 .8 no change occurs when X 31 is recalculated. 

In the analysis of variance, subtract 2 d.f from the Total and Error 
sums of squares. The Treatments S.S. and M.S. are biased upwards. To 
obtain the correct Treatments S.S. , reanalyze the data in table 11.9.3, 
ignoring the treatments and the missing values, as a one-way classification 
with unequal numbers, the blocks being the classes. The new Error 
(Within blocks) S.S. will be found to be 122.50 with 4 d.f. Subtract from 
this the Error S.S. that you obtained in the randomized blocks analysis of 
the completed data. This is 6.40, with 2 d.f. The difference, 122.50 — 6.40 
= 116.10, with 4-2 = 2 d.f is the correct Treatments S.S. The F ratio 
is 58.05/3.20 = 18.1, with 2 and 2 df. 

The same method applies to a Latin square with two missing values, 
with repeated use of the formula for inserting a missing value in a Latin 
square Formulas needed for confidence limits and r-tests involving the 
treatment means are given in (3). For experiments analyzed by electronic 
computers, a general method of estimating missing values is presented in 
(10) 



TABLE 11.9.3 

Randomized Blocks Experiment With Two Missing Values 
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Treatments 

Blocks 

Sums 

1 

2 

3 

A 

6 

5 

4 

15 


B 

15 


8 

23 


C 


15 

12 

27 


Sums 

21 

20 

24 

65 



11.10 — Non-conformity to model. In the standard analyses of vari- 
ance the model specifies that the effects of the different fixed factors (treat- 
ments, row, columns, etc.) are additive, and that the errors are normally 
and independently distributed with the same variance. It is unlikely that 
these ideal conditions are ever exactly realized in practice. Much research 
has been done to investigate the consequences of various types of failure 
in the assumptions; for an excellent review, see (1 1 ). Minor failures do 
not greatly disturb the conclusi6ns drawn from the standard analysis, in 
subsequent sections some advice is given on the detection and handling of 
more serious failures. For this discussion the types of failure are classified 
into gross errors, lack of independence of errors, unequal error variances 
due to the nature of the treatments, non-normality of errors, and non- 
additivity. 

11.11 — Gross errors: rejection of extreme observations. A measure- 
ment may be read, recorded, or transcribed wrongly, or a mistake may be 
made in the way in which the treatment was applied for this measurement. 
A major error greatly distorts the mean of the treatment involved, and, 
by inflating the error variance, affects conclusions about the other treat- 
ments as well. The principal safeguards are vigilance in carrying out the 
operating instructions for the experiment and in the measuring and re- 
cording process, and eye inspection of the data. 

If a figure in the data to be analyzed looks suspicious, an inquiry 
about this observation sometimes shows that there was a gross error and 
may also reveal the correct value for this observation. (One should check 
that the same source of error has not affected other observations also.) 
With two-way and Latin square classifications, it is harder to spot an 
unusual observation in the original data, because the expected value of 
any observation depends on the row, column, and treatment effects. In- 
stead, look at the residuals of the observations from their expected values. 
In the two-way classification, the residual D tJ is 

D tJ = X LJ - X r - X., + X.. 
while in the Latin square, 

Aji = X uk - X r . - X. y - X.. k + 2X... 
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If no explanation of an extreme residual that enables it to be corrected 
is discovered, we may consider rejecting it and analyzing the data by the 
method in section 1 1 .9 for results with missing observations. The discus- 
sion of rules for the rejection of observations began well over a century 
ago in astronomy and geodesy. Most rules have been based on something 
like a test of significance. The investigator computes the probability 
that a residual as large as the suspect would occur by chance if there is no 
gross error (taking account of the fact that the largest residual was 
selected). If this probability is sufficiently small, the suspect is rejected. 

Anscombe (12) points out that it may be wiser to think of a rejection 
rule as analogous to an insurance policy on a house or an automobile. 
We pay a premium to protect us against the possibility of damage. In 
considering whether a proposed policy is attractive, we take into account 
the size of the premium, our guesses as to the probability that damage will 
occur, and our estimate of the amount of likely damage if there is a mishap. 

A premium is involved in a rejection rule because any rule occa- 
sionally rejects an observation that is not a gross error. When this 
happens, the mean of the affected treatment is less accurately estimated 
than if we had not applied the rule. If these erroneous rejections cause 
the variances of the estimated treatment means to be increased by P% 9 
on the average over repeated applications, the rule is said to have a pre- 
mium of P%. 

Anscombe and Tukey (13) present a rule that rejects an observation 
whose residual has the value d if \d\ > Cs , where C is a constant to be 
determined and s is the S.D. of the experimental errors (square root of the 
Error or Residuals mean square). For any small value of P, say 2^% 
or 5°/ 0 , an approximate method of computing C is given (13). This 
method applies to the one-way, two-way, and Latin square classifications, 
as well as to other standard classifications with replication. The formula 
for C involves the number of Error d.f., say /, and the total number of 
residuals, say N. In our notation the values of / and N are as follows: 

Classification 

One-way ( a classes, n per class), f = a(n - 1): N = an 

Two-way (a rows, b columns). f = (a ~ 1 )(b - 1): N = ah 

Latin square (a xa) f = (a - 1 )(a — 2): N = a 2 

The formula has three steps : 

1. Find the one-tailed normal deviate z corresponding to the proba- 
bility JP/IOON, where P is the premium expressed in per cents. 

2. Calculate K = 1.40 + G.85z 


3. 


C = K 



K 2 -2 ] H 
4/ WN 


In order to apply this rule, first analyze the data and obtain the values 
of d and y. To illustrate, consider the randomized blocks wheat data 
(table 1 1 9 1 , p. 318) with a = 4, b = 5, that was used as an example of a 



323 


missing observation. This observation, for Strain D in Block 1, was 
actually present and had a value 29.3. In the analysis of the complete 
data, this observation gave the largest residual, 2.3, of all N = 20 observa- 
tions. For the complete data, s = 1.48 with / = 12. In a rejection rule 
with a ij% premium, would this observation be rejected ? 

Since N = 20, we have f/N = 0.6, P = 2.5, so that fP/lOON = (0.6) 
• (0.025) = 0.015. From the normal table, this gives 2 = 2.170. Thus, 

K = 1.40 + (0.85)(2.170) = 3.24 
C = 3.24 jl - ^j-VO.6 = 2.07 

Since Cs = (2.07)(1.48) = 3.06, a residual of 2.3 does not call for rejection. 

EXAMPLE 11.11,1— In the 5 x 5 Latin square on p. 313, the largest residual from the 
fitted model is +55.0 for treatment E in row 5 and column 5. Would this observation be 
rejected m a polic> with a 5% premium? Ans. No. Cs = 58.5 


11.12 — Lack of independence in the errors. If care is not taken, an 
experiment may be conducted in a way that induces positive correlations 
between the errors for different replicates of the same treatment. In an 
industrial experiment, all the replications of a given treatment might be 
processed at the same time by the same technicians, in order to cut down 
the chance of mistakes or to save money. Any differences that exist be- 
tween the batches of raw materials used with different treatments or in 
the working methods of the technicians may create positive correlations 
within treatments. 

In the simplest case these situations are represented mathematically 
by supposing that there is an intraclass correlation pj between any pair of 
errors within the same treatment. In the absence of real treatment effects, 
the mean square between treatments is an unbiased estimate of 
a 2 [l + (n - l)pj}, where n is the number of replications, while the error 
mean square is an unbiased estimate of <r 2 { 1 - pj), as pointed out in sec- 
tion 10.20. The F-ratio is an estimate of { 1 + (n - l)p 7 }/(l - p 7 ). With 
p 7 positive, this ratio can be much larger than 1 ; for instance, with p 7 = 0.2 
and n = 6, the ratio is 2.5. Thus, positive correlations among the errors 
within a treatment vitiate the F-test, giving too many significant results. 
The disturbance affects r-tests also, and may be major. 

In more complex situations the consequences of correlations among 
the errors have not been adequately studied, but there is reason to believe 
that they can be serious. Such correlations often go unnoticed, because 
their presence is difficult to detect by inspection of the data. The most 
effective precaution is the skillful use of randomization (section 4.12). If 
it is suspected that observations made within the same time period (e.g., 
morning or day) will be positively correlated, the order of processing of the 
treatments within a replication should be randomized. A systematic pat- 
tern of errors, if detected, can sometimes be handled by constructing an 
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appropriate model for the statistical analysis. For examples, see (14), (15), 
and (16), 

11.13 — Unequal error variances due to treatments. Sometimes one or 
more treatments have variances differing from the rest, although there is 
no reason to suspect non-normality of errors. If the treatments consist 
of different amounts of lime applied to acid soil, the smallest dressings 
might give uniformly low yields with a small variance, while the highest 
dressings, being large enough to overcome the acidity, give good yields 
with a moderate variance. Intermediate dressings might give good yields 
on some plots and poor yields on others, and thus show the highest vari- 
ance. Another example occurs in experiments in which the treatments 
represent different measuring instruments, some highly precise and some 
cruder and less expensive. The average readings given by different instru- 
ments are being compared in order to check whether the ifiexpensive 
instruments are biased. Here we would obviously expect the variance to 
differ from instrument to instrument. 

When the error variance is heterogeneous in this way, the F-test 
tends to give too many significant results. This disturbance is usually 
only moderate if every treatment has the same number of replications (11). 
Comparison of pairs or sub-groups of treatment means may, however, be 
seriously affected, since the usual estimate of error variance, which pools 
the variance over all treatments, will give standard errors that are.too large 
for some comparisons and too small for others. 

For any comparison 'L?l 1 X 1 among the class means in a one-way classi- 
fication, an unbiased estimate of itserror variance is V = 'LX 2 s 2 fn l , where 
n l is the number of replications in X t and s 2 is the mean square within the 
ith class. This result holds whether the a 2 are constant or not. If i\ 
denotes X 2 $ 2 /n v an approximate number of d.f. are assigned to V by the 
rule (25): 

d.f. = Vvf/Ztffa - 1)] 

When the n x are all equal, this becomes d.f. = (n — 1)(£ u l ) 2 f'Li l 2 . For a 
test of significance we take t = 'LX l XjfV, with this number of d.f. 

To obtain an unbiased estimate of the error variance of L = 'Ea 1 X 1 
in a two-way classification, calculate the comparison L } = Ti l X lJ sepa- 
rately in every block, (j = 1,2 The average of the b values L } is, 
of course, L. The standard error of L is X /[E(L J - L) 2 /b(b - 1 )j, with 
(/> - 1 ) d.f., which will be scanty if b is small. 

If the trouble is caused by a few treatments whose means are sub- 
stantially different from the rest, a satisfactory remedy is to omit these 
treatments from the main analysis, since conclusions about them are clear 
on inspection. With a one-way or two-way classification, the remaining 
treatments are analyzed in the usual way. The analysis of a Latin square 
uah one omitted treatment is described in (17), and with two omitted 
treatments m (18). 
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11.14 — Non-normality. Variance-stabilizing transformations. In the 
standard classifications, skewness in the distribution of errors tends to 
produce too many significant results in F- and t- tests. In addition, there 
is a loss of efficiency in the analysis, because when errors are non-normal 
the mean of the observed values for a treatment is, in general, not the most 
accurate estimate of the corresponding population mean for that treat- 
ment. If the mathematical form of the frequency distribution of the errors 
were known, a more efficient analysis could be developed. This approach 
is seldom attempted in practice, probably because the exact distribution of 
non-normal errors is rarely known and the more sophisticated analysis 
would be complicated. 

With data in which the effects of the fixed factors are modest, there is 
some evidence that non-normality does not distort the conclusions too 
seriously. However, one feature of non-normal distributions is that the 
variance is often related to the mean. In the Poisson distribution, the 
variance equals the mean. For a binomial proportion with mean p, the 
variance is p{ 1 - p)jn. Thus, if treatment or replication effects are large, 
we expect unequal variances, with consequences similar to those discussed 
in the preceding section. 

If a x 2 is a known function of the mean p of X, say c x 2 = 4>(p), a 
transformation of the data that makes the variance almost independent 
of the mean is obtained by an argument based on calculus. Let the trans- 
formation be Y = f(X), and let f'(X) denote the derivative of f(X) with 
respect to X. By a one-term Taylor expansion 

YAM+f'M x-p) 

To this order of approximation, the mean value £(y) of Y is f(p), 
since E(X — p) = 0. With the same approximation, the variance of Y is 

E{Y-my = {. m) 2 E(x - & = {/wv = {mYm 

Hence, to make the variance of Y independent 0 T/ 1 , we choose /(/i) 
so that the term on the extreme right above is a constant. This makes 
f(fi) the indefinite integral of For the Poisson distribution, this 

gives /(ju) = v^ i.e., Y = v ' X. For the binomial, the method gives 
Y = arcsin v 'p, that is, Y is the angle whose sine is y/p„ When f(X) has 
been chosen in this way, the value of the constant variance on the trans- 
formed scale is obtained by finding For the Poisson, with 

(p(fi) = /i, f(p) = yj fi , we have f\\i) = 1/2^, so that {f(p)} 2 <l>(ji) = {. 
The variance on the transformed scale is {. 

11.15 — Square root transformation for counts. Counts of rare events, 
such as numbers of defects or of accidents, tend to be distributed ap- 
proximately in Poisson fashion. A transformation to yj X is often effec- 
tive : the variance on the square jroot scale will be close to 0.25. If some 
counts are small, yfJT+l or JX + yfX H- 1, (19), stabilizes the variance 
more effectively. 
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TABLE 11 15 I 

Number of Poppy Plants in Oats 
(Plants per 3 3/4 square feet) 



Treatment 

i ... __ _ 

Block 

A 

B 

C 

D 

E 

1 

438 

538 

77 

17 

18 

2 

442 

422 

61 

31 

26 

3 

319 

377 

157 

87 

77 

4 

380 

315 

52 

16 

20 

Mean 

395 

413 

87 

38 

35 

Range 

: 

123 

223 

105 

71 

59 


The square root transformation can also be used with counts m 
which it appears_that the variance of X is proportional to the mean of X , 
that is, a x 2 — kX For a Poisson distribution of errors, k = 1, but we 
often find k larger than 1, indicating that the distribution of errors has a 
variance greater than that of the Poisson 

An example is the record of poppy plants m oats (20) shown m table 
11 15 1, where the numbers are large The differing ranges lead to a 
suspicion of heterogeneous variance If the error mean square were 
calculated, it would be too large for testing differences among C, /), E 
and too small for A and B 

In table 11 1 5 2 the square roots of the numbers are recorded and 
analyzed The ranges m the several treatments are now similar That 
there are differences among treatments is obvious , it is unnecessary to 
compute F The 5% LSD value is 3 09 ,v suggesting that D and E are 
superior to C while, of course, the C, D, E group is much superior to A 
and B in reducing the numbers of undesired poppies 


TABLE 11 15 2 

Square Roots of the Poppy Numbers in Table 1115 1 


Block 

A 


B 

C 

D 

E 

1 

20 9 

23 2 

88 

41 

42 

2 

21 0 

20 5 

78 

56 

5 1 

3 

17 9 

19 4 

12 5 

93 

88 

4 

19 5 

177 

72 

40 

45 

Mean 

19 8 

20 2 

9 1 

58 

56 

Range 

3 1 

55 

53 

53 

46 

Source of Variation 

Degrees of Freedom 

Sum of Squares 


Mean Square 

Blocks 

i 


3 

22 65 



Treatments 



4 

865 44 


216 36 

fcrror 



12 

48 69 


4 06 



327 


The means m the square root scale are reconverted to the original 
scale by squaring This gives (19 8) 2 = 392 plants for A, (20 2) 2 = 408 
plants for 5, and so on These values are slightly lower than the original 
means, 395 for ,4*413 for B, etc., because the mean of a set of square roots 
is less than the square root of the original mean As a rough correction 
for this discrepancy, add the Error mean square in the square root 
analysis to each reconverted mean In this example we add 4.06, rounded 
to 4, giving 396 for A 9 and so on. 

A transformation like the square root affects both the shape of the 
frequency distribution of the errors and the meaning of additivity If 
treatment and block effects are additive m the original scale, they will not 
be additive m the square root scale, and vice versa However, unless 
treatment and block effects are both large, effects that are additive m one 
scale will be approximately so m the other, since the square root trans- 
formation is a mild one 


EXAMPLE 11 15 1— The numbers of wireworms counted m the plots of a Latin square 
(21) following soil fumigations in the previous year were 


Rows 


1 


2 

Columns 

3 


4 

5 


1 

P 

3 

0 

2 

N 

5 

K 

1 

M 4 


2 

M 

6 

K 

0 

O 

6 

N 

4 

P 4 


3 | 

O 

4 

M 

9 

K 

1 

P 

6 

A 5 


4 

N 

17 

P 

8 

M 

8 

0 

9 

K 0 


5 

K 

4 

N 

4 

P 

2 

M 

4 

0 8 



Since these are such small numbers, transform to J(X + I ) The first number 3 becomes 
9j ( 3 + 1) - 2, etc 

Analyze the variance Ans Mean square for Treatments 1 4457 for Error 0 3259 

EXAMPLE 1115 2 — Calculate the Studenfcized Range D = 1 06 and show that K gave 
significantly fewer wireworms than M, iV, and O 

EXAMPLE 11 15 3— Estimate the average numbers of wireworms per plot for the 
several treatments Ans (with no bias correction) X, 0 99, M, 6 08, N, 6 40 0, 5 55 
P 4 38 To nuke the ^as correction add 0 33 giving K = 1 32 M ~ 6 41, etc 

EXAMPLF 1115** If the error variance of X in the original scale is k times the mean 
of X and it effects are additive m the square root scale, it can be shown that the true error 
variance in the square root scale is approximately kf 4 Thus the value of k can be estimated 
from the analysis in the square root scale If k is close to 1 this suggests that the distribution 
of errors in the original scale may be close to the Poisson distribution In example 11 1 5 1 
k is about 4(0 3259) = 1 3 suggesting that most of the variance m the original scale is ot the 
Poisson type With the poppy plants (table 11 15 2) k is about 16 indicating a variance 
much greater than the Poisson 


11.16 — Arcsia transformation for proportions. This transformation 
also called the angular transformation, was developed for binomial pro- 
portions If a l} successes out of n are obtained in the jth replicate ot the 
rth treatment, the proportion p l} = ajn has vanance p l} {\ - p l} )/n By 
means of table A 16, due to C I Bliss, we replace p i; by the angle whose 
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sine is y/p tJ . In the angular scale, proportions near 0 or 1 are spread out 
so as to increase their variance. If all the error variance is binomial, the 
error variance in the angular scale is about 821 jn. The transformation 
does not remove inequalities in variance arising from differing values of 
n. If the rT s vary widely, a weighted analysis in the angular scale is 
advisable. 

With n < 50, a zero proportion should be counted as 1/4 n before 
transforming to angles, <|nd a 100% proportion as (n - 1/4 )/n. This 
empirical device, suggested by Bartlett (22), improves the equality of 
variance in the angles. A more accurate transformation for small n has 
been tabulated by Mosteller and Youtz (19). 

Angles may also be used with proportions that are subject to other 
sources of variation in addition to the binomial, if it is thought that the 
variance of p tJ is some multiple of p XJ ( 1 - p l} \ Since, however, this 
product varies little for p XJ lying between 30% and 70%, the angular trans- 
formation's scarcely needed if nearly all the observed p l} lie in this range. 
In fact, this transformation is unlikely to produce a noticeable change in 
the conclusions unless the p tJ range from near zero to 30% and beyond 
(or from below 70% to 100%). 

Table 11.16.1, taken from a larger randomized blocks experiment 
(23), shows the percentages of unsalable ears of corn, the treatments being 
a control, A, and three mechanical methods of protecting against damage 

TABLE 11.16.1 

Percentage of Unsalable Ears of Corn 





Block 





Treatments 

1 

2 

3 

4 

5 

6 



A 

42.4 

34.3 

24.1 

39.5 

55.5 

49.1 



B 

33.3 

33.3 

5.0 

26.3 

30.2 

28.6 



C 

8.5 

21.9 

6.2 

16.0 

13.5 

15.4 



D 

16.6 

19 3 

16.6 

2.1 

11.1 

11.1 





Angle 

= Arcsin ^Proportion 


Mean % 

A 

40.6 

35 8 

29.4 

38.9 

48.2 

44.5 

39 6 

40.6 

B 

35.2 

35.2 

12.9 

30.9 

33.3 

32 3 

29.9 

24.9 

C 

17.0 

27 9 

14.4 

23.6 

21 6 

23.1 

21.3 

13.2 

D 

24.0 

26.1 

24 0 

8.3 

19.5 

19.5 

20.2 

11.9 


Analysis of Variance in Angles 



Degrees of Freedom 

Sum of Squares 

Mean Square 

Blocks 

5 

359 8 


Treatments 

3 

1,458 5 

486 2 

Liror 

15 

546.1 

36 4 

lohll 

23 

2,364 4 
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by corn earworm larvae. The value of n, about 36, was not constant, but 
its variations were fairly small and are ignored. Note that the per cents 
range from 2.1% to 55.5%. 

In the analysis of variance of the angles (table 11.16.1), the Error 
mean square was 36.4. Since 821/n = 821/36 = 22.8, some variation in 
excess of the binomial may be present. The F-value for treatments is 
large. The 5% LSD for comparing two treatments is 7.4. B, C, and D 
were all superior to the control A, while C and D were superior to B. The 
angle means are retranslated to per cents at the right of the table. 

11.17 — The logarithmic transformation. Logarithms are used to stabi- 
lize the variance if the standard deviation in the original scale varies di- 
rectly as the mean ; in other words, if the coefficient of variation is constant. 
There are mathematical reasons why this type of relation between standard 
deviation and mean is likely to be found when the effects are proportional 
rather than additive; for example, when treatment 2 gives results con- 
sistently 23% higher than' treatment 1 rather than results higher by, say, 
18 units. In this situation the log transformation may bring about both 
additivity of effects and equality of variance. If some 0 values of X occur, 
log {X + 1) is often used. 


TABLE 11.17.1 

Estimated Numbers of Four Kinds of Plankton (I, . . IV) Caught in Six Hauls 
With Each of Two Nets 



Estimated Numbers 

Logarithms 

Haul 

I 

II 

III 

IV 

1 

II 

III 

IV 

1 

895 

1,520 

43,300 

11,000 

2.95 

3.18 

4.64 

4.04 

2 

540 

1,610 

32,800 

8,600 

2 73 

3.21 

4 52 

3.93 

3 

1,020 

1,900 

28,800 

8,260 

L01 

3.28 

4.46 

3.92 

4 

470 

1,350 

34,600 

9,830 

2,67 

3.13 

4.54 

3.99 

5 

428 

980 

27,800 

7,600 

2.63 

2.99 

4.44 

3.88 

6 

620 

1,710 

32,800 

9,650 

2.79 

3,23 

4.52 

3 98 

7 

760 

1,930 

28,100 

8,900 

2.88 

3.29 

4.45 

3.95 

8 

537 

1,960 

18,900 

6,060 

2.73 

3.29 

4.28 

3 78 

9 ! 

845 

1340 

31,400 

10,200 

2.93 

3.26 

4.50 

4.01 

10 

1,050 

2,410 

39,500 

15,500 

3.02 

3.38 

4.60 

4.19 

11 

387 

1,520 

29,000 

9,250 

2.59 

3.18 

4.46 

3.97 

12 

497 

1,685 

22,300 

7,900 

2.70 

3.23 

4.35 

3 90 

Mean 

671 

1,701 

30,775 

9,396 

2.802 

3.221 

4.480 

3.962 

Range 

663 

1,480 

24,400 

9,440 i 

0.43 

0.39 

0.36 

0.41 


Analysis of Variance of Logarithms 


Source of Variation 

[ Degrees of Freedom 

Sum of Squares 

Mean Square 

Kind of plankton 

3 

20.2070 

6.7357 

Haul 

11 

0.3387 

0.0308 

Discrepance 

i 

33 

0.2300 

0.0070 
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The plankton catches (24) of table 11.17.1 yielded nicely to the log 
transformation. The original ranges and means for the four kinds of 
plankton were nearly proportional, the ratios of range to mean being 
0.99, 0.87, 0.79, and 1.00. After transformation the ranges were almost 
equal and uncorrelated with the means. 

Transforming back, the estimated mean numbers caught for the four 
kinds of plankton are antilog 2.802 = 634; 1,663; 30,200; and 9,162. 
These are geometric means. 

The means of the logs will be found to differ significantly for all 
four kinds of plankton. The standard deviation of the logarithms is 
70.0070 = 0.084, and the antilogarithm of this number is 1.21. Quot- 
ing Winsor and Clark (page 5), “Now a, deviation of 0.084 in the 
logarithms of the catch means that the catch has been multiplied (or 
divided) by 1.21. Hence we may say that one standard deviation in the 
logarithm corresponds to a percentage standard deviation, or coefficient 
of variation, of 21% in the catch.” 

EXAMPLE 11.17.1 — The following data were abstracted from an expenment (27) 
which was more complicated in design. Each entry is the geometric mean of insect catches 
by a trap in three successive nights, one night at each of three locations. Three types of 
trap are compared over five three-mght periods. The insects are macrolepidoptera at 
Rothamsted Experimental Station: 



3-Night Periods. August, 1950 

Trap 

16-18 

19-21 

22-24 

25-27 

28-30 

1 

19.1 

23.4 

29.5 

23.4 

16.6 

2 

50.1 

* 166.1 

223.9 

58.9 

64.6 

3 

123.0 

407.4 

398.1 

229.1 

251.2 


Williams found the log transformation effective in analyzing highly variable data like 
these. Transform to logarithms and analyze their variance. Ans. Mean square for traps 
= 1.4455; for error, 0.0172. 

Show that all differences between trap means are significant and that the geometric 
means for traps are 21.9, 93.3, and 257.0 insects. 

11.18 — Non-additivity. Suppose that in a two-way classification, with 
2 rows and 2 columns, the effects of rows and columns are proportional 
or multiplicative instead of additive. In each row, column B exceeds 
column A by a fixed percentage, while in each column, row 2 exceeds row 
1 by a fixed percentage. Consider column percentages of 20% and 100% 
and row percentages of 10% and 50%. These together provide four com- 
binations. Taking the observation in column A, row 1, as 1.0, the other 
observations are shown in table 1 1 . 1 8. 1 for the four cases. 

Thus, in case 1, the value of 1.32 for B in row 2 is 1.1 x 1.2. Since 
no experimental error has been added, the error mean square in a correct 
analysis should be zero. The correct procedure is to transform the data 
to logs before analysis. In logs the effects become additive, and the error 
mean square is zero. From the analysis in logs, we learn that B exceeds 
A by exactly 20% in cases 1 and 2, and by exactly 100% in cases 3 and 4. 
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TABLE 11.18.1 

Hypothetical Data for Four Cases With Multiplicative Effects 



: Case 1 

Case 2 

Case 3 

Case 4 


C 

20% 

' C 20% 

C 100% 

C 

100% 


R 

10% 

R 50% 

R 10% 

R 

50% 

Row 

A 

B 

A 

B 

A 

B 

A 

B 

1 

1.0 

1.2 

1.0 

1.2 

1.0 

2.0 

1.0 

2.0 

2 

1.1 

1.32 | 

1.5 

1.8 

u 

2.2 

1.5 

3.0 

Means 

1.05 

1.26 ! 

1.25 

1.50 

1.05 

2.10 

1.25 

2.50 

s 


0.01 

0.05 

0.05 

I 

0.25 

s/X m 


0.9% 

3.6% 

3.2% 

13.3% 


If the usual analysis of variance is carried out in the original scale, the 
standard error s per observation (with 1 d.f.) is shown under each case. 
With 2 replications, 5 is also the s.e. of the difference B — A. Conse- 
quently, in case 1 we would conclude from this analysis that B — A is 0.2J. 
with a standard error of +0.01. In case 4 we conclude that B - A 
= 1.25 ± 0.25. The standard errors, ±0.01 and ±0.25, are entirely a 
result of the fact that we used the wrong model for analysis. In a real 
situation where experimental errors are also present, this variance s 2 due 
to non-additivity is added to the ordinary experimental error variance a 1 . 

To generalize, the analysis in the original scale has two defects. It 
fails to discover the simple proportional nature of the relationship be- 
tween row and column effects. It also suffers a loss of precision, since the 
error variance is inflated by the component due to non-additivity. If 
rtw and column effects are both small, these deficiencies are usually not 
senous. In case 1 , for example, the standard error s due to non-additivity 
only is 0.9% of the mean. If the ordinary standard error <j were 5% of the 
mean (a low value for most data), the non-additivity would increase this 
only to 725.81 or 5.1%. The loss of precision from non-additivity is 
greater in cases 2 and 3 and jumps markedly in case 4 in which both row 
and column effects are large. 

\ t .19 — Tukey’s test of additivity. This is useful in a variety of ways : 
f;}/ to help decide if a transformation is necessary; (ii) to suggest a suitable 
transformation; (iii) to learn if a transformation has been successful in 
producing additivity (28, 29). 

The test is related to transformations of the form Y = X*, in which 
X is the original scale, and we are seeking a power p of X such that effects 
are additive in the scale of Y = X p . Thus, p = 1/2 represents the square 
root transformation and p = — 1 a reciprocal transformation, analyzing 
1 IX instead of X. The value p = 0 is interpreted as a log transformation, 
because the variable X p behaves like log X when p is small. 

The rationale of the test can be indicated by means of calculus. For 
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the two-way classification, if effects are exactly additive in the scale of Y, 
we have, 

Y l} = F. + (7,. - Y..) + (Y.j - F..) 

- k.[i + {(i^- - E.j + cr.,- k.)}/k.] 

We suppose that row and ^olumn effects are small relative to the 
mean. This implies that a, = (Yj. — Y..)/Y.. and = ( Y, — F.)/F. are 
both small. 

Write AT tJ = Y tJ 1/p and expand in the usual Taylor’s series. This gives 

* u ==7.. 1/p [l+a 1 + £] 1/p 

= l + l(a i + ^) + I I + + /?/) + ...J 

Now, in the yY scale the terms in a„ a t 2 represent row effects and the terms 
in p p p 2 represent column effects that are added together in the above 
expression. These terms are therefore still additive in the X scale. The 
first non-additive term is the one in a t /? r Written in full, this term is 

F. 1/P (l - p)(% ~ Y..)(Y.j - F. .)!p 2 Y.} (11.19.1) 

For our purpose we need to write this expression in terms of X rather 
than Y. By new single-term Taylor expansions we have, since Y = XX, 

Y v - F..== pX, p ~ l {X v -X..): Y 3 - F. = - X.) 

Substitution into (11.19.1) gives for the first non-additive term in X ip 
(1 - p)Y.. ilp (X t . - X..)(X.j - X.)X. 2p “ 2 /F. 2 
Using Y. d= X. p , this term may be expressed approximately as 

{ lzA(X i .-X..)(X. J -X..) (11.19.2) 

Since this term represents a non-additive effect of rows and columns, it 
will appear m the residual of X VJ when an additive model is fitted in the X 
scale. The conclusions from this rough argument are as follows: 

1 . If this type of non-additivity is present in X , and X l3 is the fitted 
value given by the additive model, the residual X tJ — has a linear re- 
gression on the variate (X t . - X.)(Xj — X,.). 

2. The regression coefficient B is an estimate of (1 — p)/X... Thus, 
the power p to which X must be raised to produce additivity is estimated 
by (1 - BX,.). Commenting on this result, Anscombe and Tukey (13) 
state (their k is our Bf 2 ) : “It is important to emphasize that the available 
data rarely define the 'correct’ value of p with any precision. Repeating 
the analysis and calculation of k for each of a number of values of p may 
show the range of values of p clearly compatible with the observations, but 
experience and subject-matter insight are important in choosing a p for 
final analysis.” 
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3. Tukey’s test is a test of the null hypothesis that the population 
value of B is zero. A convenient way of computing B and making the 
test is illustrated by the data in table 1 1.19.1. The data are average insect 
catches in three traps over five periods. The same data were presented 
in example 11.17.1 as an exercise on the log transformation. We now 
consider the additivity of trap and period effects in the original scale. The 
steps are as follows (see table 1 1.19.1 for calculations) : 


TABLE 1 J .19.1 

Macrolepidoptera Catches by Three Traps in Five Periods 
(Calculations for test of additivity) 


Period 

1 

Trap 

2 

3 

Sum 

X, 

Mean 

x,. 

4 

h s — 'ZX ij d i 

1 

19.1 

50.1 

123.0 

192.2 

64.1 

-74 9 

14,025 

2 

23.4 

166.1 

407.4 

596.9 

199.0 

+ 60.0 

51,096 

3 

29 5 

223.9 

398.1 

651.5 

217.2 

+ 78.2 

47,543 

4 

23 4 

58.9 

229.1 

311.4 

103.8 

-35.2 

28,444 

5 

16.6 

64.6 

251.2 

332.4 

110.8 

-28.1 

32,243 

Sum X.j 

112.0 

563.6 

1408.8 

2084.4 


Sn, 

« 173,351 

Mean X.. 

22.4 

1127 

281.8 


139 0 




— 116.6 

-26.2 

f 142.8 



0 0 



( i ) Find 4 = X,. - X.. and dj=X j-X , both adding exactly to aero 
(li) w, = (19.1)(-U66) + (50.1)(-26.2) + (1230)(+1428) = 14,025 
H-s = (16.6)(-116.6) + (64.6)(-26.2) + (25 ! 2)(+1428) = 32.243 
Check: 173,351 =(112.0)(- 116.6) + (563.6)(-26.2) + (1408 8)( + 142 8) 

N - 2>,4 = (14,Q25)( — 74.9)+ . +(32,243)(-28 1) = 38259x 10 6 

(in) Z4 2 = (-74 9) 2 + . . . + (-28.1) 2 = 17,354 

Id/ =(-1166) 2 + .. +(+ 142 8) 2 = 34,674 
D = (Xd, 2 )CZdf) = (17,354)(34,674) = 601 7 x 10 6 


N 2 (3 8259) 2 (10 12 ) 
(i\) S S for non-additivity = ^ = ~( 601 7 ) 7T0 s f 


= 24,327 


(i) Calculate d, = X,. - X.. and d } = X., - X.., rounding if neces- 
sary so that both sets add exactly to zero. 

(ii) Compute w, = Z X tJ dj and record them in the extreme right 

J 

column: Then find 


N = I M t = ZZ X tj d,dj 

l 

A is the numerator of B 

(hi) The denominator D of B is {'Ld l 2 )('Ld J 2 ). Thus, B = N/D. 

(i\ ) The eontnbutson oi non-additivit> to the error sum of squares 
of \ is \ 7 Z). with 1 d f This is tested by an F- test against the remainder 
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TABLE 11.19.2 

Analysis of Variance and Test of Additivit" 



Degrees of Freedom 

Sum of Squares 

Mean Square 

Periods 

4 

52,066 


Traps 

2 

173,333 


Error 

8 

30,607 


Non-additivity 

1 

24,327 

24,327 

Remainder 

7 

6,280 

897 


F= 24,327/897 = 27.1, d.f. = 1, 7. P <0.01 


of the Error S.S., which has {(r - l)(c - 1) - 1} d.f. The test is made in 
table 11.19.2. 

The hypothesis of additivity is untenable. What type of transforma- 
tion is suggested by this test? 


B = 


N 

D 


3.8259 

601.7 


= 0.006358 


p = 1 - BX.. = 1 - (0.006358)(139.0) = 1 - 0.88 = 0.12. 


The test suggests a one-tenth power of X. This behaves very much 
like log X. 

11.20 — Non-additivity in a Latin square. If the mathematical analysis 
of the previous section is carried out for a Latin square, the first non-addi- 
tive term, corresponding to equation 11.19.2, is, as might be guessed, 

( ^Y~^ {(*,.. ~ X...)(X.j. - X...) + - X...)(X.. k - X...) 

+ (x.j. - x...)(x.. k - .¥...)} 


Consequently, the test for additivity is carr. ed out by finding the regression 
of (X ljk - X ljk ) on the variate in [ j above, as illustrated in (28). Note 
that D is the error sum of squares of the | \ variable. 

We shall, instead, illustrate an alternative method of doing the com- 
putations, due to Tukey (29), that generalizes to other classifications. 
Table 11.20.1 comes from an experiment on monkeys (30), the raw data 
being the number of responses to auditory or visual stimuli administered 
under five conditions (A, ... £). Each pair of monkeys received one type 
of stimulus per week, the order from week to week being determined by 
the randomized columns of the Latin square. 

It was discovered that the standard deviation of the number of 
responses was almost directly proportional to the mean, so the counts were 
transformed to logs. Each entry in the table is the mean of the log counts 
for the two members of a pair. Has additivity been attained? 
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TABLE 11.20.1 

Logs of Numbers of Responses by Pairs of Monkeys Under Five Stimuli 
(Test of additivity in a Latin square) 


Pair 


1 


2 


Week 

3 


4 


5 

X,.. 

1 

%ijk 

B 

1 

1.99 

2.022 

D 

2.25 

2.268 

C 

2.18 

2.220 

A 

2.18 

2.084 

E 

2.51 

2.518 

2.222 

d tjk 

u, Jk 


-0.032 

37 


-0.018 

3 


-0.040 

0 


0.098* 

17 


-0.008 

92 


2 

D 

2.00 

1.950 

B 

1.85 

1,932 

A 

1.79 

1.852 

E 

2.14 

2.152 

C 

2.31 

2.206 

2.018 



0.052* 

70 


-0.082 

80 


-0.062 

132 


-0.012 

4 


0.104 

0 


3 

C 

2.17 

2.132 

A 

2.10 

2.082 

E 

2.34 

2.348 

B 

2.20 

2.178 

D 

2.40 

2.472 

2.242 



0.038 

7 


0.018 

18 


-0.006* 

18 


0.022 

1 


-0.072 

66 


4 

E 

2.41 

2.456 

C 

2.47 

2.462 

B 

2.44 

2.366 

D 

\ 

2.53 

2.526 

A 

2.44 

2.482 

2.458 



-0.046 

58 


0.010* 

61 


0.074 

23 


0.004 

97 


-0.042 

71 


5 

A 

1.85 

1.862 

E 

2.32 

2.248 

D 

2.21 

2.176 

C 

2.05 

2.162 

B 

2.25 

2.234 

2.136 



-0.012 

125 


0.072 

1 


0.034 

2 


-0.112 

3 


0.018* 

0 




2.084 


2.198 


2.192 


2,220 


2.382 

2215 

X.. k 


A 

2.072 


B 

2.146 


C 

2 236 


D 

2 278 


E 

2.344 

1 

L_ 


* Denotes deviations that were adjusted m order to make the deviations add to zero 
over every row, column, and treatment. 


The steps follow. 

1. Find the row, column, and treatment means, as shown, and the 
fitted values % ljk by the additive model 

** = *,.. + X. r + *..*- 2Z... 

For E in row 2, column 4, 

Y 245 = 2.018 + 2.220 + 2.344 - 2(2.215) = 2.152 

2. Find the residuals d ljk = X ijk - % ljk as shown, adjusting if neces- 
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sary so that the sums are zero over every row, column, and treatment. 
Values that were adjusted are denoted by an * in table 11.20.1. 

3. Construct the 25 values of a variate U ljk = c x 0t ljk c 2 ) 2 , where 
c x and c z are any two convenient constants. We took c 2 = X ... = 2.215, 
which is often suitable, and c x = 1000, so that the t/’s are mostly be- 
tween 0 and 1 00. For B in row 1 , column 1 , 

U 1X2 = 1000(2.022 - 2.2 15) 2 = 37 

4. Calculate the regression coefficient of the d lJk on the residuals of 
the U ljk . The numerator is 

N=ld ijk U ljk = ( — 0.032)(37) + • • • + (0.018)(G) - -20.356 

The denominate f D is the error sum of squares of the U lJk . This is found 
by performing the ordinary Latin square analysis of the U lJk . The value 
of D is 22,330. 

5. To perform the test for additivity, find the S.S. , 0.0731, of the 
d ljk , which equals the error S.S. of the X ljk . The contribution due to non- 
additivity is N 2 /D = ( — 20.356) 2 /22,330 = 0.0186. Finally, compare the 
mean square for Non-additivity with the Remainder mean square. 



Degrees of Freedom 

Sum of Squares 

Mean Square F 

Error S 5 
Non-additivity 1 

12 

1 

0.0731 

0.0186 

0 0186 3.76 (/> = 0.08) 

Remainder 

11 

0 0545 

0.00495 


The value of P is 0.08 — a little lew, though short of the 5% level. 
Since the interpretations are not critical (examples 11.20.4, 11.20.5), the 
presence of slight non-additivity should not affect them. 

The above procedure applies also in more complex classifications. 
Note that iHve expand the quadratic c x (X lJk - X...) 2 , the coefficient of 
terms like (X t . — X...)(X. r — X...) is 2 c 2 . Hence the regression co- 
efficient B of the previous section is B = 2c x N/D. If a power transforma- 
tion is needed, the suggested power is as before p = 1 — BX... . 

\ EXAMPLE 1 1 20 1— The following data are the number of lesions on eight pairs of 
half leaves inoculated with two strengths of tobacco virus (from table 4 3 1) 


Treatments 1 


1 31 

2 18 


2 


20 

17 


3 


18 

14 


Replications 
4 5 6 


17 9 8 

11 10 7 


7 


10 

5 


8 


7 

6 
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Test for additivity by the method of section 11.19. Ans. : 



Degrees of Freedom 

Sum of Squares 

Mean Square F 

Error 

7 

65 


^on-additivity 

1 

38 

38 8.4 

Remainder 

6 

27 

4.5 


Fis significant at the 5% level. The non-additivity may be due to anomalous behavior 
>f the 31, 18 pair. 


EXAMPLE 1 1.20.2 — Apply J(X + 1) to the virus data. While F now becomes non- 
lgmficant, the pair (31, 18) still appears unusual. 

EXAMPLE 11.20.3 — The data in example 1 1.2.1, regarded as a 3 x 3 two-way classifi- 
cation, provide another simple example of Tukey’s test. Ans. For non-additivity, F- 5 66. 


EXAMPLE 1 1 .20.4 — Analyze the vanance of the logarithms of the monkey responses. 
tfoix will get, 



Degrees of Freedom 

Sum of Squares 

Mean Square F 

Vlonkey Pairs 

4 

0.5244 

0.1311 

Weeks 

4 

0.2294 

0.0574 

Stimuli 

4 

0.2313 

0.0578 9 6 

Error 

12 

0.0725 

0.00604 


EXAMPLE 11.20.5— Test all differences among the means m table 11.20.1, using the 
LSD method Ans. E > A, B, C; D > A, B; C > A. 


EXAMPLE 11.20,6— Calculate the sum of squares due to the regression of log re- 
iponse on weeks. It is convenient to code the weeks as X = —2, — 1, 0, 1, 2. Then, taking 
he weekly means as. K, Zxy = 0.618 and (Lxyf/'Lx 2 » 0.03819. On the per item basis, 
he sum of squares due to regression is 5(0.03819) = 0. 1910. The line for Weeks in example 
1 1 20 4 may now be separated into two parts- 


Linear Regression 

1 

0.1910 

0.1910 

Deviations from Regression 

3 

0.0384 

0.0128 


Comparing the mean squares with error, it is seen that deviations are not significant, most 
>f the sum of squares for Weeks being due to the regression 
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★ CHAPTER TWELVE 


E— experiments 


12.1 — Introduction. A common problem in research is investigating 
the effects of each of a number of variables or factors as they are called, 
on some response Y. Suppose a company in the food industry proposes 
to market a cake mix from which the housewife can make a cake by adding 
water and then baking. The company must decide on the best kind of 
flour and the correct amounts of fat, sugar, liquid (milk or water), eggs, 
baking powder, and flavoring, as well as on the best oven temperature and 
the proper baking time. These are nine factors, any one of which may 
affect the palatability and the keeping quality of the cake to a noticeable 
degree. Similarly, a research program designed to learn how to increase 
the yields of the principal cereal crop in a country is likely to try to measure 
the effects on yield of different amounts of nitrogen, phosphorus, and 
potassium when added as fertilizers to the soil. Problems of this type 
occur frequently in industry: with complex chemical processes there can 
be as many as 10 to 20 factors that may affect the final product. 

In earlier times the advice was sometimes given to study one factor 
at a time, a separate experiment being devoted to each factor. Later, 
Fisher (1) pointed out that important advantages are gained by combin- 
ing the study of several factors in the same factorial experiment. Factorial 
experimentation is highly efficient, because every observation supplies 
information about all the factors included in the experiment. Secondly, 
as we will see, factorial experimentation is a workmanlike method of in- 
vestigating the relationships between the effects of different factors. 

12.2 — The single factor versus the factorial approach. To illustrate 
the difference between the “one factor at a time” approach and the fac- 
torial approach, consider an investigator who has two factors, A and B, 
to study. For simplicity, suppose that only two levels of each factor, say 
a u a 2 , and b t , b 2 are to be compared. In a cake mix, a„ a 2 might be two 
types of flour and b u b 2 two amounts of flavoring. Four replications are 
Considered sufficient by the investigator. 

In the single-factor approach, the first experiment is a comparison 
of with a 2 - The level of B is kept constant in the first experiment, but 
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the investigator must decide what this constant level is to be. We shall 
suppose that B is kept at b x : the choicemade does not affect our argument. 
The two treatments m the first experiment may be denoted by the symbols 
a 1 b i and a 2 by re plicat ed four times. The effect of A, that is, the mean 
difference a 2 b 1 — a x b u is estimated with a variance 2a z /4 = <t 2 /2. 

The second experiment compares b 2 with b x . If a 2 performed better 
than a x in the first experiment, the investigator is likely tc use a 2 as the 
constant level of A in the second experiment (again, this choice is not vital 
to the argument). Thus, the second experiment compares a 2 b 1 with 
a 2 b 2 in four replications, and estimates the effect of B with variance a 2 / 2. 

In the two single-factor experiments , 16 observations have been made , 
and the effects of A and B have each been estimated with variance a 1 12. 

But suppose that someone else, interested in these factors, hears that 
experiments on them have been done. He asks the investigator: In my 
work, I have to keep A at its lower level, a x . What effect does B have 
when A is at a x ? Obviously, the investigator cannot answer this question, 
since he measured the effect of B only when A was held at its higher level. 
Another person might ask : Is the effect of A the same at the two levels of 
£? Once again, the investigator has no answer, since A was tested at only 
one level of B. 

In the factorial experiment,* the investigator compares all treatments 
that can be formed by combining the levels of the different factors. There 
are four such treatment combinations, a x b x , a 2 b x , a x b 2 , a 2 b 2 . Notice 
that each replication* of this experiment supplies two estimates of the 
effect of A. The comparison a 2 b 2 — a x b 2 estimates the effect of A when 
B is held constant at its higher level, while the comparison a 2 b x — a x b x 
estimates the effect of A when B is held constant at its lower level. The 
average of these two estimates is called the main effect of A, the adjective 
main being a reminder that this is an average taken over the levels of the 
other factor, In terms of our definition of a comparison (section 10.7) 
the main effect of A may be expressed as 


L a = {{a 2 b 2 ) + i (a 2 b x ) - i(a x b 2 ) - i(a x b x ), (12.2.1) 

where (a 2 b 2 ) denotes the yield given by the treatment combination a 2 b 2 
(or the average yield if the experiment has r replications), and so on. By 
Rule 10,7.1 the variance of L A is 

y{(i) 2 + (i) 2 +(i) 2 + (i) 2 } = y 

If the investigator uses 2 replications (8 observations), the mam effect of A 
i& estimated* with a variance a 1 12. 

Now Consider B. Each replication furmshes two estimates, 
a 2 b 2 — a 2 b x , and a x b 2 — a x b u of the effect of B. The main effect of B is 
the comparison 

Lb — i (# 2 ^ 2 ) + ifaibf) ~ i (® 2 bi) — j(a x bi) (12.2.2) 
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With two replications of the factorial experiment ( 8 observations), L B , like 
L a , has variance a 2 / 2. 

Thus, the factorial experiment requires only 8 observations, as against 
16 by the single-factor approach, to estimate the effects of A and B with 
the same variance a 2 / 2. With 3 factors, the factorial experiment requires 
only 1/3 as many observations, with 4 factors only 1/4, and so on. These 
striking gains in efficiency occur because every observation, like (a 2 b 1 ), 
or (a 1 b 2 c 2 ), or (a 2 b 1 c\d 2 ), is used in the estimate of the effect of every 
factor. In the single-factor approach, on the other hand, an observation 
supplies information only about the effect of. one factor. 

What about the relationship between the effects of the factors? The 
factorial experiment provides a separate estimate of the effects of A at 
each level of B, though these estimates are less precise than the main 
effect of A, their variance being a 2 . The question: Is the effect of A the 
same at the .two levels of B1, can be examined by means of the com- 
parison: 

{(a 2 b 2 ) - (a x b 2 )} - {(a 2 b x ) - (a x b x )} (1223) 

This expression measures the difference between the effect of A when 
B is at its higher level and the effect of A when B is at its lower level If 
the question is : Does the level of A influence the effect of 2??, the relevant 
comparison is % 

{(a 2 b 2 ) — (a 2 b x )} — {(a x b 2 ) — (a x b x )} (12.2.4) 

Notice that (12.2.3) and (12.2.4) are identical. The expression is called 
the AB two-factor interaction. In this, the combinations ( a 2 b 2 ) and (a x b x ) 
receive a + sign, the combinations (a 2 b x ) and (a x b 2 ) a — sign. 

Because of its efficiency and comprehensiveness, factorial experi- 
mentation is extensively used in research programs, particularly in in- 
dustry. One limitation is that a factorial experiment is usually larger and 
more complex than a single-factor experiment. The potentialities of fac- 
torial experimentation in clinical medicine have not been fully exploited, 
because it is usually difficult to find enough suitable patients to compare 
more than two or three treatment combinations. 

In analyzing the results of a 2 2 factorial, the commonest procedure is 
to look first at the two main effects and the two-factor interaction. If the 
interaction seems absent, we need only report the main effects, with some 
assurance that each effect holds at either level of the other variate. A 
more compact notation for describing the treatment combinations is also 
standard. The presence of a letter a or b denotes one level of the factor 
m question, while the absence of the letter denotes the other level. Thus, 
a 2 b 2 becomes ab, and a x b 2 becomes b. The combination a x b x is denoted 
by the symbol (1). In this notation, table 12.2.1 shows how to compute 
the main effects and the iuteraction from the treatment totals over r 
replications. 
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TABLE 12.2.1 

Calculation of Main Effects and Interaction in a 2 2 Factorial 


Factorial 

Effect 

Multiplier for 
Treatment Total 
(1) a b 

ab 

Divisor 
to give 
Mean 

Contribution 

to 

Treatments S.S . 

A 

-1 

1 

-1 

1 

2r 

Uf/4r 

B 

-1 

-1 

1 

1 | 

2r 

[B] 1 / 4r 

AB 

1 

-1 

-1 

1 

2 r 

[AB] 2 /4r 


Thus, the main effect of 4 is: . 

[A)/2r = [(ab) - (b) + (a) - (1)1/2 r 

The quantities [^4 ], [,0], [45] are called factorial effect totals. Use of 
the same divisor, 2 r, for the 45 interaction mean is a common convention. 

In the analysis of variance, the contribution of the main effect of 4 
to the Treatments S.S. is [4] 2 /4r, by Rule 11.6.1. Further, note that the 
three comparisons [4], [5] and [45] in table 12.2.1 are orthogonal. By 
Rule 11.6.4, the three contributions in the right-hand column of table 
12.2.1 add up to the Treatments 5.5. 

EXAMPLE 12.2.1 — Yates (2) pointed out that the concept of factorial experimentation 
can be applied to gain accuracy when weighing obj ects on a balance with two pans. Suppose 
that two objects are to be weighed and that in any weighing the balance has an error dis- 
tributed about 0 with variance a 2 . If the two objects are weighed separately, the balance 
estimates each weight with variance a 2 . Instead, both objects are placed in one pan, giving 
an estimate y x of the sum of the weights. Then the objects are placed in different pans, 
giving an estimate y 2 of the difference between the weights. Show that the quantities 
Gy + >y)/ 2 and Oy ~ JY)/2 give estimates of the individual weights with variance cr 2 / 2. 

EXAMPLE 12.2.2 — If four objects are to be weighed, show how to conduct four weigh- 
ings so that the weight of each object is estimated with variance o~ 2 /4. Hint: First weigh 
the sum of the objects, then refer to table 12.2.1. 

12.3 — Analysis of the 2 2 factorial experiment. The case where no 
interaction appears is illustrated by an experiment (3) on the fluorometric 
determination of the riboflavin content of dried collard leaves (table 
12.3.1). The two factors were 4, the size of sample (0.25 gm., 1.00 gm.) 
from which the determination was made, and 5, the effect of the inclusion 
of a permanganate-peroxide clarification step in the determination. This 
was a randomized blocks design replicated on three successive days. 

The usual analysis of variance into Replications, Treatments, and 
Error is computed. Then the factorial effect totals for 4, 5, and 45 are 
calculated from the treatment totals, using the multipliers given in table 
12.3.1. Their squares are divided by 4r, or 12, to give the contributions 
to the Treatments 5.5. The P value corresponding to the F ratio 
13.02/8.18 for Interaction is about 0.25: we shall assume iiiteraction ab- 
sent. Consequently, attention can be concentrated on the main effects. 
The Permanganate step produced a large reduction in the estimated ribo- 
flavin concentration. The effect of Sample Size was not quite significant. 
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TABLE 12.3.1 

Apparent Riboflavin Concentration (mcg./gm.) in Collard Leaves 


Replication 

Without 
Permanganate 
0.25 gm. 1.00 gm. 
Sample Sample 

With 

Permanganate 
0.25 gm. 1 .00 gm. 
Sample Sample 

Total 


1 

39.5 

38.6 

27.2 

24.6 

129.9 


2 

43.1 

39.5 

23.2 

24.2 

130.0 

' 

3 

45.2 * 

33.0 

24.8 

22.2 

125.2 


; 





Factorial 

Factorial 

Total 

127.8 

111.1 

75.2 

71.0 

Effect 

Effect 

i 

a) 

a 

b 

ab 

Total 

Mean S.E. 

Sample Size (A) 

-1 

1 

-1 

1 

-20.9 

~ 3.5) 

Permanganate (2?) 

-1 

-1 

1 

1 

—92.7 

- 15.4 V± 1.65 

Interaction ( AB ) 

1 

-1 

-1 

1 

12.5 

2.1 J 


Degrees of Meant 

Source of Variation Freedom Sum of Squares Square P 


Replications 

2 

3.76 



Treatments 

(3) 

(765.53) 



Sample size 

1 

(— 20.9) 2 /12= 36.40 

36.40 

0.08 

Permanganate 

1 

(-92.7) 2 /12 = 716.11 

716.11 

<0.01 

Interaction 

1 

(12.5) 2 /12 = 13.02 

13.02 

0.25 

Error 

6 

49.08 

8.18 



Instead of subdividing the Treatments S.S . and making F-tests, one 
can proceed directly to compute the factorial effect means. These are 
obtained by dividing the effect totals by 2 r, or 6, and are shown in table 
1 2.3.1 bes ide th e effect totals. The standard error of an effect mean is 
yjp'/r = JlJ3 = 1.65. The f-tests of the effect means are of course the 
same as the' F-tests in the analysis of variance. Use of the effect means 
has the advantage of showing the magnitude and direction of the effects. 

The principal conclusion from this experiment was that “In the 
fhiorometric determination of riboflavin of the standard dried collard 
sample, the permanganate-hydrogen peroxide clarification step is essen- 
tial. Without this step, the mean value is 39.8 meg. per gram, while with 
it the more reasonable mean of 24.4 is obtained.” These data are discussed 
further in example 12.4.1. 

EXAMPLE 12.3.1 — From table 12.3.1, calculate the means of the four treatment 
combinations. Then calculate the mam effects of A and 5, and venfy that they are the same 
as the “Effect Means” shown m table 12.3.1. Verify also that the. ,42? interaction, if calcu- 
lated by equations (12.2.3) or (12.2 4), is twice the effect mean in table 12.3.1. As already 
mentioned, the extra divisor 1/2 in the case of an interaction is a convention. 

EXAMPLE 12.3.2 — From a randomized blocks experiment on sugar beets m Iowa the 
numbers of surviving plants per plot were counted as follows 
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Treatments 

1 

Blocks 

2 3 

4 

Totals 

None 

183 

176 

291 

254 

904 

Superphosphate, P 

356 

300 

301 

271 

1228 

Potash, K 

224 

258 

244 

217 

943 

P + K 

329 

283 

308 

326 

1246 

Totals 

1092 

1017 

1144 

1068 

4321 


( l ) Compute the sums of squares for Blocks, Treatments, and Error. Verify that the 
Treatments S.S is 24,801, and the mean square for error is 1494 

(ii) Compute the S S. for P, K, and the PK interaction. Verify that these add to the 
Treatments S.S. and that the only significant effect is an increase of about 34% m plant 
number due to P This result is a surprise, since P does not usually have marked effects on 
the number of sugar-beet plants. 

( m) C ompute the factorial effect means from the individual treatment means with their 
s.e . y/sr/r, and verify that r-tests of the factorial effect means are identical to the F-tests m 
the analysis of variance. 

EXAMPLE 12 3.3 — We have seen how to calculate the factorial effect means (A), (B), 
and ( AB ) from the means (ab), ( a ), (b), and (1) of the individual treatment combinations. 
The process can be reversed : given the factorial effect means and the mean yield M of the 
experiment, we can recapture the means of the individual treatment combinations Show 
that the equations are 


(ab) = U +4 {(A) + (B) + (AB)} 

(a) = M + i {(A) - (. B ) - (AB)} 

(b) = M + i { -(4) + (B) - (AB)} 
(1 ) = M + i{-(A)-(3) + (AB)} 


12,4 — The 2 2 factorial when interaction Is present. When interaction 
is present, the results of a 2 2 experiment require more detailed study. If 
both main effects are large, an interaction that is significant but much 
smaller than the main effects may imply merely that there is a minor 
variation in the effect of A according as B is at its higher or lower level, and 
vice versa. In this event, reporting of the main effects may still be an 
adequate summary. But in most cases we must revert to a report based 
on the 2 x 2 table. 

Table 12.4.1 contains the results (slightly modified) of a 2 2 experi- 
ment in a completely randomized design. The factors were vitamin B 12 
(0, 5 mg.) and Antibiotics (0, 40 mg.) fed to swine. A glance at the totals 
for the four treatment combinations suggests that with no antibiotics, 
B 12 had little or no effect (3.66 versus 3.57), apparently because intestinal 
flora utilized the B 12 . With antibiotics present to control the flora, the 
effect of the vitamin was marked (4.63 versus 3.10). Looking at the table 
the other way, the antibiotics alone decreased gain (3.10 versus 3.57), 
perhaps by suppressing intestinal flora that synthesize B 12 ; but with B 12 
added, the antibiotics produced a gain by decreasing the activities of un- 
favorable flora. 




345 


TABLE 12.4.1 

Factorial Experiment With Vitamin b 12 and Antibiotics. 
Average Daily Gain of Swine (Pounds) 


Antibiotics 

0 

40 

mg. 



b 12 

0 

5 mg. 

0 

5 mg. 




1 30 

1.26 

1.05 

1.52 




1 19 

1.21 

1.00 

1.56 




1.08 

1.19 

1.05 

1.55 








Factorial 

Factorial 

Totals 

3.57 

3.66 

3.10 

4.63 

Effect 

Effect 


u) 

a 

b 

ab 

Total 

Mean SE 

b 12 

-1 

1 

-1 

1 

1.62 

0.270 ** 

Antibiotics 

-1 

~1 

1 

1 

0.50 

0.083* ±0.035 

Interaction 

1 

~1 

-1 

1 

1 44 

0.240** 

Source of Variation 

Degrees of Freedom 

Sum of Squares Mean Square 

Treatments 



3 


0.4124 


Error 



8 


0.0293 

...» — -jC* 

0.00366 


The summary of the results of this experiment is therefore presented 
m the form of a table of the meai)s of the four treatment combinations* 
as shown below : 


Antibiotics 

1 0 

40 mg. 

i . - .... ^ ~ - 

B i2 

0 

5 mg. | 

; 

0 

5 mg 

Means 

1.19 

1 22 

1.03 

1.54 


In the analysis of variance, s 2 is 0.00366, wi th %d.f. The s.e. of the differ- 
ence between any two treatment means is <J(2s~/3) = ±p.049. You may 
verify that the decrease due to antibiotics when B i2 is absent, and the 
increases to each additive when the other is present, are all clearly sig- 
nificant.. 

If, instead, we begin by calculating the factorial effects, as shown in 
table 12.4.1, we learn from the factorial effect means that there is a sig- 
nificant interaction at the 1% level (0.240 ± 0.035). This immediately 
directs attention back to the four individual treatment totals -or means, 
in order to study the nature of the interaction and seek an explanation. 
The mam effects both happen to be significant, but are of no interest. 

One way of descnbing the no-interaction situation is to say that the 
effects of the two factors a're additive . To illustrate, suppose that the 
population mean for the (1) combination (neither factor present) is fi, 
Factor A, when present alone, changes the mean to (/i + a): Factor B, 
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when present alone, to (pi 4- /?). If both factors are present, and if their 
effects are additive, the mean will become p -f a + /?. 

With this model, the interaction effect is 

(AB) = \ [(( ab ) + (1) — (a) — (l h)] = j + a + ^ + = 0 

Presence of an interaction denotes that the effects are not additive. 

With quantitative factors, this concept leads to two other possible 
explanations of an interaction found in an experiment. Sometimes their 
effects are additive, but on a transformed scale. The simplest example is 
that of multiplicative effects, in which a log transformation of the data 
before analysis (section 11.17) removes the interaction. 

Secondly, if X u X 2 represent the amounts of two factors in a treat- 
ment combination, it is natural to summarize the results by means of a 
response function or response surface , which predicts how the response Y 
varies as X t and X 2 are changed. If the effects are additive, the response 
function has the simple form 

Y = fi o + iMk 4- fi 2 X 2 


A significant interaction is a warning that this model is not an adequate 
fit. The interaction effect may be shown to represent a term of the form 
P l2 X l X 2 in the response function. The presence of a term in X t X 2 in the 
response function suggests that terms in X 2 and X 2 may also be needed 
to represent the function adequately. In other words, the investigator 
may require a quadratic response function. Since at least three levels of 
each variable are required to fit a quadratic surface, he may have to plan 
a larger factorial experiment. 

EXAMPLE 12 4 1 — Our use of the riboflavin data in section 12 3 as an example with 
no interaction might be criticized on two grounds • (1) a P value of 0 25 in the test for inter- 
action m a small experiment suggests the possibility of an interaction that a larger experiment 
might reveal, (2) perhaps the effects are multiplicative m these data If you analyze the logs of 
the data m table 12 3 1, you will find that the jF-value for interaction is now only 0 7 Thus 
the assumption of zero interaction seems better grounded on a log scale than on the original 
scale. 


12.5 — The general two-factor experiment. Leaving the special case of 
two levels per factor, we now consider the general arrangement with 
a levels of the first factor and b levels of the second. As before, the layout 
of the experiment may be completely randomized, randomized blocks, or 
any other standard plan. 

With a levels, the mam effects of A m the analysis of variance now 
have (i a — 1) d.f, while those of B have (b — 1) df. Since there are ab 
treatment combinations, the Treatments SS. has (ab — 1) df Conse- 
quently, there remain 

(ab - 1) - (a - 1) - (b - 1) = ab - a - b + 1 = (a 1 )(b - 1) 
df, which may be shown to represent the AB interactions. In the 2 x 2 
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factorial, in which the AB interaction had only one df, the comparison 
corresponding to this d.f. was called the AB interaction. In the general 
case, the AB interaction represents a set of ( a - l)(b ~ 1) independent 
comparisons . These can be subdivided into single comparisons in many 
ways. 

In deciding how to subdivide the AB sum of squares, the investigator 
is guided by the questions that he had in mind when planning the experi- 
ment. Any comparison among the levels of A is estimated independently 
at each of the b levels of B. For a comparison that is of particular interest, 
the investigator may wish to examine whether the level of B affects these 
estimates. The sum of squares of deviations of the estimates, with the 
appropriate divisor, is a component of the AB interaction, with (b - 1) 
d.f., which may be isolated and tested against the Error mean square. 
Incidentally, since the main effect of A represents {a — 1) independent 
comparisons, these components of the AB interaction jointly account for 
{a — 1)(6 — 1) d.f. and will be found to sum to the sum of squares for AB. 

As an illustration, the data in table 12.5.1 show the gains in weight of 
male rats under six feeding treatments in a completely randomized experi- 
ment. The factors were: 

^(3 levels): Source of protein: Beef, Cereal, Pork 

B ( 2 level's): Level of protein: High, Low 

Often the investigator has decided in advance how to subdivide the 
comparisons that represent main effects and interactions. In more ex- 


TABLE 12 5 1 

Gains in Weight (Grams) of Rats Under Six Diets 



High Protein 



Low Protein 


Beef 

Cereal 

Pork 

Beef 

Cereal 

Pork 

73 

98 

94 

90 

107 

49 

102 

74 

79 

76 

95 

82 

118 

56 

96 

90 

97 

73 

104 

111 

98 

64 

80 

86 

81 

95 

102 

86 

98 

81 

107 

88 

102 

51 

74 

97 

100 

82 

108 

72 

74 

106 

87 

77 

91 

90 

67 

70 

117 

86 

120 

95 

89 

61 

111 

92 

105 

78 

58 

82 

Totals 1,000 

859 

995 

< 792 

L 

839 

787 

Source of Variation 

Degrees of Freedom Sum of Squares 

Mean Square 

F 

Treatments 

5 


4,613 0 



A (Source of protein) 

2 


266 5 

1332 

06 

B (Level of protein) 

1 


3,168 3 

3,168 3 

14 8** 

AB 

2 


1,1782 

589 1 

27 

Error 

54 

11,585 7 

214 6 
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ploratory situations, it is customary to start with a breakdown of the 
Treatments S.S. into the S.S. for A, B, and AB. This has been done in 
table 12.5.1. Looking at the main effects of A, the three sources of protein 
show no differences in average rates of gain (F = 0.6), but there is a clear 
effect of level of protein (F = 14.8), the gain being about 18% larger with 
the High level. 

For AB, the value of Fis 2.7, between the 10% and the 5% level. In 
the general two-factor experiment and in more complex' factorials, it often 
happens that a few of the comparisons comprising the main effects have 
substantial interactions while the majority of the comparisons have 
negligible interactions. Consequently, the F-test of the AB interaction 
sum of squares as a whole is not a good guide as to whether interactions 
can be ignored. It is well to look over the two-way table of treatment 
totals or means before concluding that there are no interactions, particu- 
larly if Fis larger than 1 . 

Another working rule tested by experience in a number of areas is 
'that large main effects are more likely to have interactions than small 
ones. Consequently, we look particularly at the effects of B, Level of 
protein. From the treatment totals in table 12.5.1 we see that high pro- 
tein gives large gains over low protein for beef and pork, but only a small 
gain for cereal. This suggests a breakdown into: (1) Cereal versus the 
average of Beef and Pork, and (2) Beef versus Pork. This subdivision is 
a natural one, since Beef and Pork are animal sources of protein while 
Cereal is a vegetable source, and would probably be planned from the 
beginning in this type of experiment. 

Table 12.5.2 shows how this breakdown is made by means of five 
single comparisons. Study the coefficients for each comparison carefully, 
and verify that the comparisons are mutually orthogonal. In the lower 
part of the table the divisors required to convert the squares of the factorial 
effect totals' into sums of squares in the analysis of variance are given. 
Each divisor is « times the sum of squares of the coefficients in the com- 
parison (n =10). As anticipated, the interaction of the animal versus 
vegetable comparison with level of protein is significant at the 5% level. 
There is no sign of a difference between Beef and Pork at either level. 

The principal results can therefore be summarized in the following 
2x2 table of means. 


Mean Rat Gams m Weight per Week (Grams) 


' Level of 

Source of Protein 



Protein 

Animal 

Vegetable 

Difference 

SE 

High 

99.8 

85.9 

+ 139* 

+ 5.67 

Low 

79.0 

83.9 

- 49 

±5.67 

Difference 

+ 20.8** 

+ 2.0 



SE 

± 4.6 

± 6\5 






TABLE 12.5.2 

Subdivision of the S.S. For Main Effects and Interactions 
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High Protein 


Low Protein 


Factonal 

Comparisons 

Beef 

Cereal 

Pork 

Beef 

Cereal 

Pork 

Effect 

(Treatment Totals) 

1000 

859 

995 

792 

839 

787 

Total 

Level of protein 

+ 1 

+ 1 

+ f 


-1 

-1 

436 

Animal vs. vegetable 

+ 1 

-2 

+ 1 

+ 1 

-2 

+ 1 

178 

Interaction with level 

+ 1 

—2 

+ 1 

-1 

+ 2 

-1 

376 

Beef vs. pork 

+ 1 

0 

-1 

*t* 1 

0 

-1 

10 

Interaction with level 

+ 1 

0 

-1 

~1 

0 

+ 1 

0 


Divisor 

Degrees of 

Sum of 


Mean 

Comparison 

for S.S. 

Freedom 

Squares 


Square 

Level of protein 


60 

1 


3168,3** 



Animal vs. vegetable 


120 

1 


264,0 



Interaction with level 


120 

1 


1178.1* 



Beef vs. pork 


40 

1 


2.5 



Interaction with level 


40 

1 


0.0 



Error 



54 




214.6 


As a consequence of the interaction, the animal proteins gave sub- 
stantially greater gains in weight than cereal protein at the high level, but 
showed no superiority to cereal protein at the low level. 

12.6 — Response curves. Frequently, the levels of a factor represent 
increasing amounts X of some substance. It may then be of interest to 
examine whether the response Y to the factor has a linear relation to the 
amount X. An example has already been given in section 11.8, p. 313, 
in which the linear regression of yield of millet on width of spacing of the 
rows was worked out for a Latin square experiment. If the relation be- 
tween Y and X is curved, a more complex mathematical expression is re- 
quired to describe it. Sometimes the form of this expression is suggested 
by subject-matter knowledge. Failing this, a polynomial in X is often 
used as a descriptive equation. 

With equally spaced levels of X, auxiliary tables are available that 
facilitate the fitting of these polynomials. The tables are explained fully 
in section 15.6 (p. 460). An introduction is given here to enable them to 
be used in the analysis of factorial experiments. The tables are based 
essentially on an ingenious coding of the values of X, X 2 , and so on. 

With three levels, the values of X are coded as —1,0,+ 1, so that they 
sum to 0. If Y u Y 2 , Y 3 are the corresponding response totals over n 
replicates, the linear regression coefficient b x is YXY/nEX 2 , or 
(F 3 - - Y x )/2n. The values of X 2 are 1 , 0, 1. Subtracting their mean 2/3 
so that they add to 0 gives 1/3, -2/3, 1/3. Multiplying by 3 in order to 
have whole numbers, we get the coefficients 1, —2, 1. In its coded form, 
this variable is X 2 = 3X 2 — 2. The regression coefficient of Y on X 2 is 
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b 2 = YX 2 Y/nI.X 2 2 , or ( Y 3 - 2 Y 2 + YJ/bn. The equation for the parab- 
ola fitted to the leyel means of Y is 

f=Y+b 1 X + b 2 X 2 (12.6.1) 

With four levels of X, they are coded -3, - 1, + 1, + 3, so that they 
are whole numbers adding to 0. The values of X 2 are 9, 1, 1, 9, with 
mean 5. Subtracting the mean gives +4, —4, —4, +4, which we divide 
by 4 to give the coefficients + 1, - 1, - 1, + 1 for the parabolic component. 
These components represent the variable X 2 — (X 2 — 5)/4. The fitted 
parabola has the same form as (12.6.1), where 

b t = (3 Y 4 + Y 3 - Y 2 - 3 y x )/20« : b 2 = ( T 4 - Y 3 - Y 2 + Y t )/4n, 

the Y t being level totals. For the cubic component (term involving X 3 ) a 
more elaborate coding is required to make this orthogonal to X and X 2 . 
The resulting coefficients are — 1, + 3, — 3, + 1 . 

By means of these polynomial components, the S.S. for the main 
effects of the factor can be subdivided into linear, quadratic, cubic com- 
ponents, and so on. Each S.S. can be tested against the Error mean 
square as a guide to the type of polynomial that describes the response 
curve. By rule 11.6.1, the contribution of any component EAjTj to the 
S.S. is (Xij Y>) 2 /nLk 2 . If the component is computed from the level 
means , as in the following illustration, the divisor is (EA 2 )/n. 

Table 12.6.1 presents the mean yields of sugar (cwt. per acre) in an 
experiment (4) on beet sugar in which a mixture of fertilizers was applied 
.at four levels. (0, 4, 8, 12 cwt. per acre). 

TABLE 12.6.1 

Linear, Quadratic, and Cubic Components of Response Curve 


Mixed Fertilizers (Cwt. Per Acre) 


Mean Yields 

0 

34.8 

4 

41.1 

8 

42.6 

Linear 

-3 

-1 

41 

Quadratic 

4- 1 

-1 

-1 

Cubic 

-1 

43 

-3 


12 

41.8 


Component 


Sum of 
Squares 


+ 3 
+ 1 
+ 1 


4-22.5 
- 7.1 
4- 2.5 


202.5 
100.8 
2.5 ' 


F 


17.0** 

8.5* 

0.2 


Total = Sum of Squares for Fertilizers = 


305.8' 


Error mean square (16 df.) = 11.9 

Since each mean was taken over n = 8 replicates, the divisors are 
20/8 = 2.5 for the linear and cubic components and 4/8 = 0.5 for the 
quadratic component. The Error mean square was 1 1 .9 with 16 d.f. The 
positive linear component and the negative quadratic component are 
both significant, but the cubic term gives an F less than 1 . The conclu- 
sions are : (i) mixed fertilizers produced an increase in the yield of sugar, (ii) 
the rate of increase fell off with the higher levels. 

To fit the parabola, we compute from table 12.6.1, 
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7 = 40.08 : b 1 = +22.5/20 = 1.125 : b 2 = -7.1/4 = - 1.775 

The fitted parabola is therefore 

t = 40.08 + 1.125Z - l.775X 2 , (12.6.2) 

where t is an estimated mean yield. The estimated yields for 0, 4, 8, 12 
cwt. of fertilizers are 34.93, 40.73, 42.98, 41.68 cwt. per acre. Like the 
observed means, the parabola suggests that the dressing for maximum 
yield is around 8 cwt. per acre. 

Table 12.6.2 shows the coefficients for the polynomial components 
and the values of for factors having from 2 to 7 levels. With k levels 
a polynomial of degree (A: — 1) can be made to fit the k responses exactly. 


TABLE 12.6.2 

Coefficients and Divisors for Sets of Orthogonal Components in Regression 
If X Is Spaced at Equal Intervals 


Degree of 
Poly- 
nomial 

Comparison 

Number of Levels 

Divisor 

LI 2 

1 

2 

3 

4 

5 


6 

7 

1 

Linear 

-1 

+ 1 







2 

2 

Linear 

-1 

0 

4- 1 






2 


Quadratic 

+ 1 

-2 

4 1 






6 

3 

Linear 

— 3 

-1 

4 1 

4- 3 





20 


Quadratic 

+ 1 

-1 

- 1 

4- 1 





4 


Cubic 

-1 

+ 3 

- 3 

4- 1 





20 

4 

Linear 

-2 

-1 

0 

4- 1 

,4- 

2 



10 


Quadratic 

+ 2 

-1 

- 2 

- 1- 

4- 

2 



14 


Cubic 

-1 

+ 2 

0 

- 2 

4- 

1 



10 


Quartic 

4l 

-4 

+ 6 

- 4 

4- 

1 



70 

5 . 

Linear 

— 5 

— 3 

- 1 

4 1 

4 

3 

4-5 


70 


Quadratic 

+ 5 

-1 

- 4 

- 4 

- 

1 

4-5 


84 


Cubic 

— 5 

+ 7 

4 4 

- 4 

— 

7 

4- 5 


180 


Quartic 

+ 1 

-3 

4 2 

4 2 

- 

3 

4-1 


28 


Quintic 

-1 

+ 5 

-10 

4-10 

— 1 

5 

"4-1 


252 

6 

Linear 

-3 

-2 

- 1 

0 

4 

1 

4-2 

4-3 

28 


Quadratic 

! +5 

0 

- 3 

- 4 

- 

3 

0 

45 

84 


Cubic 

! -1 

+ 1 

4- 1 

0 

— 

1 

-1 

41 

6 


Quartic 

j +3 

-7 

4- 1 

4- 6 

4 

1 

-7 

43 

154' 


Quintic 

! -1 

+4 

- 5 

0 

4- 

5 

-4 

41 

84 


Sextic' 

! +i 

.—6 

4-15 

-20 

4-15 

-6 

41 

924 


EXAMPLE 12.6.1 — In the same sugar-beet experiment, the mean yield of tops (green 
matter) for 0, 4, 8, 12 cwt. fertilizers were 9.86, 11.58, 13.95, 14.95 cwt. per acre. The Error 
mean square was 0.909. Show that : (i) only the linear component is significant, there being 
no apparent decline in response to the higher applications, (ii) the S.S. for the linear, quad- 
ratic, and cubic components sum to the S.S. between levels, 127.14 with 3 d.f. Remember 
thatthe means are over 8 replicates. 
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EXAMPLE 12 6 2 — From the results for the parabolic regression on yield of sugar 
the estimated optimum dressing can be computed by calculus From equation 12 6 2 the 
fitted parabola is 


f = 40 08 + 1 1252c - 1.775*2, 
where X 2 - {X 2 ~ 5)/4 Thus 

? = 40 08 + 1 125* - 0 444(* 2 - 5) 

Differentiating, we find a turning value at * — 1 125/0 888 — 1 27 on the coded scale 
You may verify that the estimated maximum sugar yield is 43 0 cwt., for a dressing of 8 5 cwt 
fertilizer 

12.7 — Response curves In two-factor experiments. Either or botf 
factors may be quantitative and may call for the fitting of a regression as 
described in the previous section. As an example with one quantitative 


TABLE 12 7.1 

Yield of Cowpea Hay (Pounds Per 1/100 Morgen Plot) From Three Varieties 



Blocks 

3 

255 64 


Varieties, V 

2 

1027 39 

513 70** 

Spacings, S 

2 

155 06 

77 53* 

Interactions, VS 

4 

765 44 

191.36** 

Error 

24 

424 11 

17 67 
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factor, table 12.7.1 shows the yields in a 3 x 3 factorial on hay (5), one 
factor being three widths of spacing of the rows, the other being three 
varieties. 

The original analysis of variance, at the foot of table 12.7.1, reveals 
marked VS (variety x spacing) interactions. The table of treatment com- 
bination totals immediately above shows that there is an upward trend 
in yield with wider spacing for varieties I and III but an opposite trend 
with variety II. This presumably accounts for the large VS mean square 
and warns that no useful overall statements can be made from the main 
effects. 

To examine the trends of yield Y on spacing X, the linear and quad- 
ratic components are calculated for each variety, table 12.7.2. The fac- 
torial effect totals for these components are computed first, then the cor- 
responding sums of squares. Note the following results from table 12,7.2 : 

(i) As anticipated, the linear slopes are positive for varieties I and III 
and negative for variety II. 

(ii) The linear trend for each variety is significant at the 1% level, 
while no variety shows any sign of curvature, when tested against the 
Error mean square of 17.67. 


TABLE 12.7 2 

Linear and Quadratic Components for Each Variety in Cowpea Experiment 


Liu ir 

Qua Hratic 

4" 

8" 

12" 

Totals for Components 

-1 
+ 1 

0 

— 2 

-fl 
+ 1 

Linear 

Quadratic 

Var cty I 

190 

203 

223 

33 

7 

Variety II 

249 

234 

209 

—40 

-10 

Variety III 

224 

257 

292 

68 

2 

Sum 

663 

694 

724 

61 

- 1 


Contributions to Sums of Squares 


(33) 2 (7) 2 

Variety I* Linear, — — — 136 12** Quadratic, 


II 

III 


(4X2) 

(_40) 2 

(4)(2) 

(68) 2 

(4X2) 


: 200 00 ** 
= 578.00** 


(4)(6) 

(- 10) 2 

(4X6) 

(2) 2 

(4)(6) 


-204 

-4 17 

-0.17 


Total 


914 12 


6 38 


Verification 914 12 4* 6 38 = 155 06 + 765 44 (~S -f SF),/— 6 
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(iii) The sum of these six S.S. is identical with the S.S. for spacings 
and interactions combined, 920.50. 

(iv) If the upward trends for varieties I and III are compared, the 
trend for variety III will be found significantly greater. 

To summarize, the varieties have linear trends on spacing which are 
not the same. Apparently I and III have heavy vegetative growth which 
requires more than 12" spacing for maximum yield. In a further experi- 
ment the spacings tested for varieties I and III should differ from those 
for II. 

EXAMPLE 12.7 1 — In the variety x spacing experiment, verify the statement that the 
linear regression of yield on width of spacing is significantly greater for variety III than for 
variety I. 

EXAMPLE 12.7.2 — If the primary interest m this experiment were in comparing the 
varieties when each has its highest-yielding spacing, we might compare the totals 223 (I), 
249 (II), and 292 (III). Show that the optimum for III exceeds the others at the 1% level 

12.8 — Example of a response surface. We turn now to a 3 x 4 experi- 
ment in which there is regression in each factor.* The data are from the 
Foods and Nutrition Section of the Iowa Agricultural Experiment Sta- 
tion (6). The object was to learn about losses of ascorbic acid m snap- 
beans stored at 3 temperatures for 4 periods, each 2 weeks longer than the 
preceding. The beans were all harvested under uniform conditions before 
eight o’clock one morning. They were prepared and quick-frozen before 
noon of the same day. Three packages were assigned at random to each 
of the 12 treatments and all packages were stored at random positions in 
the locker, a completely randomized design. 

The sums of 3 ascorbic acid determinations are recorded in table 
12.8.1. It is clear that the concentration of ascorbic acid decreases v ith 


TABLE 12 8 1 

Sum of Three Ascorbic Acid Determinations (mg /100 g ) for Each or 12 Treatments 
in a 3 x 4 Factorial Experiment on Snapbeans 



Weeks of Storage 


Temperature, F.° 

2 

4 

6 

8 

Sum 

0 

45 

47 

46 

46 

184 

10 

45 

43 

41 

37 

166 

20 

34 

28 

21 

16 

99 

Sum 

124 

118 

108 

99 

449 


Degrees of Freedom 

Sum of Squares 

Mean Square 

Temperature, T 

2 


334.39 



Two-week Period, P 

3 


40.53 



Interaction, TP 

6 


34.05 




Error* 24 0 706 


here 


Error (packages £>f same treatment) was calculated from original data not recorded 
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higher storage temperatures and, except at 0°, with storage time. It looks 
as if the rate of decrease with temperature is not linear and not the same 
for the several storage periods. Thpse conclusions, suggested by inspec- 
tion of table 12.8.1, will be tested in the following analysis: 

One can look first at either temperature or period; we chose tem- 
perature. At each period the linear and quadratic temperature com- 
parisons (- 1, 0, + 1 ; + 1, -2, + 1) are calculated : 


Weeks of Storage 

2 

4 

6 

8 

Total 

Linear, T L 

-11 

-19 

-25 

-30 

-85 

Quadratic, T Q 

-11 

-11 

-15 

-12 

-49 


The downward slopes of the linear regressions get steeper with time. This 
will be examined later. At present, calculate sums of squares as follows: 


Tr = 




(-85) 2 
(12) (2) 
(— 49) 2 
( 12 )( 6 ) 


301.04** 


33.35** 


The sum is the sum of squares for T, 301.04 + 33.35 — 334.39. Sig- 
nificance in each effect is tested by comparison with the Error mean square, 
0.706. Evidently the regressions are curved, the parabolic comparison 
being significant; quality decreases with accelerated rapidity as the tem- 
perature increases. (Note the number of replications in each temperature 
total, 4 periods times 3 packages = 12.) 

Are the regressions the same for all periods? To answer this, calculate 
the interactions of the linear and the quadratic comparisons with period. 
The sums of squares for these interactions are: 


t l p 

(-11) 2 + + (-30) 2 

- T l = 33.46** 

(3 d.f.) 

(3) (2) 


TqP 

( — ll) 2 + ... +(-12) 2 

- T q = 0.59 

(3 d.f.) 

(3) (6) 


Rule 12.8.1. These calculations follow from a new rule. If a com- 
parison L l has been computed for k different levels of a second factor, 
the Interaction 5.5. of this comparison with the second factor is 


2L, 2 VLf 

n(E2 2 ) fcn(X2 2 ) (i=l,2, ...k) 

with {k - 1) d.f. Further, the term (IL 1 ) 2 /fcn(E/ 2 ) is the overall S.5. 
(1 d.f S for this comparison. The sum of T L P and TqP is equal to the sum 
of squares for TP. The linear regressions decrease significantly with 







TABLE 12.8.2 

Analysis of Variance of Ascorbic Acid m Snap Beans 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Temperature: 

(2) 

(33439) 


Tu 

1 


301.04** 

Tq 

1 


33.35** 

Period: 

(3) 

(40,53) 


P L 

1 


40.14** 

p q 

1 


0.25 

Pc 

1 

♦ 

0.14 

Interaction: 

(6) 

(34.05) 


t l p l 

1 


33.0&** 

t l p q 

1 


0.38 

T l Pc 

1 


0.01 

t q p l 

1 


0.14 

T q P q 

1 


0.12 

T a P c 

1 


0.34 

Error 

24 


0.706 


The sum is T Q P = 0.60. Clearly there is no change in T a with period. The 
results are collected in table 12 . 8.2 

In summary, T L and T a show that the relation of ascorbic acid to tem- 
perature is parabolic, the rate of decline increasing as storage time 
lengthens ( T L P ' L ). The regression on period is linear, sloping down- 
ward more rapidly as temperature increases. In fact, you will note in 
table 12.8.1 that at the coldest temperature, 0°F, there is no decline in 
amount of ascorbic acid with additional weeks of storage. 

These results can be expressed as a mathematical relation between 
ascorbic acid T, storage temperature T, and weeks of storage W. As we 
have seen, we require terms in T L , T Q , P L , and T L P L in order to describe 
the relation adequately. It is helpful to write down these polynomial 
coefficients for each of the 12 treatment combinations, as shown in table 

12.8.3. . 

For the moment, think of the mathematical relation as having the 

form 

? = y + + b 2 X 2 + b 3 X 3 + 64*4 

where ¥ is the predicted ascorbic acid total over 3 replications, while 
X x = T l , X 2 = T q , X 3 = P L , and * 4 = T L P L . The regression coefficient 
b t = XX,Y/XX 2 . The quantities £*,7, which were all obtained in the 
earlier analysis, are given at the "foot of table 12.8.3, as well as the divisors 
XX 2 . Hence, the relation is as follows : 

f = 37.417 - 10.625TJ - 2.042* 2 - 1.417* 3 - 1.575*4 (12.8.1) 

Since the values of the X t are given in table 12.8.3, the predicted 
values ¥ are easily computed for each treatment combination. For ex- 
ample, for 0°F. and 2 weeks storage. 
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TABLE 12.8.3 

Calculation of the Response Surface 


Temp. Weeks 

Y 

Totals 

T l = 

o.i(r-io) 

\= 
3Jt — 2 

A= 

fP-5 

T l P l = 

o.i(r- \o)(w— 5) 


0° 2 

45 

- 1 

4“ 1 

- 3 

4- 3 

45.53 

4 

47 

- 1 

4- 1 

- 1 

4* 1 

45.84 

6 

46 

- 1 

4- 1 

+ 1 

- 1 

46.16 

8 

46 

- 1 

+ 1 

+ 3 

- 3 

46.47 

10° 2 

45 

0 

- 2 

- 3 

0 

45.75 

4 

43 

0 

- 2 

- 1 

0 

42.92 

6 

41 

0 

- 2 

+ 1 

0 

40.08 

8 

37 

0 

- 2 

+ 3 

0 

37.25 

1 0° 2 

34 

+ 1 

+ 1 

- 3 

- 3 

33.73 

4 

28 

+ 1 

+ 1 

- 1 

- 1 

27.74 

6 

21 

+ 1 

+ 1 

+ 1 

4- 1 

21.76 

8 

16 

+ 1 

+ 1 

+ 3 

4- 3 

15.77 

XX t Y 

449 

-85 

-49 

-85 

-63 


Divisor for b 

12 

8 

24 

60 

40 



f = 37.417 - (10.625)(— 1) - 2.042(+ 1) - 1.417(-3) - (1.575)(+3) 

= 45.53, - 

as shown in the right-hand column of table 12.8.3. 

By decoding, we can express the prediction equation (12.8.1) in terms 
of T (°F.) and W (weeks). You may verify that the relations between 
XfTf), X 2 (T q ),X 3 (P l ), X a (T l P l ) and T and W are as given at the top of 
table 12.8.3. After making these substitutions and dividing by 3 so that 
the prediction refers to the ascorbic acid mean per treatment combination, 
we have 

f = 15.070 + 0.3167T- 0.02042 J 2 + 0.0528 W — 0.05250 7TF (12.8.2) 

Geometrically, a relation of this type is called a response surface, since we 
have now a relation in three dimensions Y, T, and W. With quantitative 
factors, the summarization of the results by a response surface has proved 
highly useful, particularly in industrial research. If the objective of the 
research is to maximize Y, the equation shows the combinations of levels 
of the factors that give responses close to the maximum. Further accounts 
of this technique, with experimental plans specifically constructed for 
fitting response surfaces, are given in (7) and (8). The analysis in this 
example is based on (6). 

A word of warning. In the example we fitted a multiple regression 
of Y on four variables X u X 2 , X 3 , X 4 . The methods by which the regres- 
sion coefficients b, were computed apply only if the X t are mutually 
orthogonal, as was the case here. General methods are presented in 
chapter 13. 
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* . 12.9— Three-factor experiments; the 2 3 . The experimenter often re- 
quires evidence about the effects of 3 or more factors in a common en- 
vironment. The simplest arrangement is that of 3 factors each at 2 levels, 
the 2 x 2 x 2 or 2 3 experiment. The eight treatment combinations may be 
tried in any of the common experimental designs. 

The data in table 12.9.1 are extracted from an unpublished random- 
ized blocks experiment (9) to leam the effect of two supplements to a com 
ration for feeding pigs. The factors were as follows : 

Lysine (£) : 0 and 0.6%. 

Soybean meal (P) : Amounts added to supply 12% and 14% protein. 

Sex (S) : Male and Female. 


TABLE 12.9.1 

Average Daily Gains of Pigs in 2 3 Factorial Arrangement of Treatments. 
Randomized Blocks Experiment 


Ly- 

sine 

% 

Pro- 

tein 

% 

Sex 

Replications (Blocks) 

Treat- 

ment 

Sum 

Sum 
! for 2 
Sexes 

1 

2 

3 

4 

5 

6 

7 

8 

0 

12 

M 

1.11 

0.97 

1.09 

0.99 

0.85 

1.21 

1.29 

0.96 

8.47 




F 

1.03 

0.97 

0.99 

0.99 

0.99 

1.21 

1.19 

1.24 

8.61 

17.08 


14 

M 

1,52 

1.45 

1.27 

1.22 

1.67 

1.24 

1.34 

1.32 

11.03 




F 

1,48 

1.22 

1.53 

1.19 

1.16 

1.57 

1.13 

1.43 

10.71 

21.74 

0.6 

12 

M 

1.22 

1.13 

1.34 

1.41 

1.34 

1.19 

1.25 

1.32 

10.20 , 



■ 

F 

0.87 

1.00 

1.16 

1.29 

1.00 

1.14 

1.36 

1.32 

9.14 

19.34 


14 

M 

1.38 

1.08 

1.40 

1.21 

1.46 

1.39 

1.17 

1.21 

10.30 




F 

1.09 

1.09 

1.47 

1.43 

1.24 

1.17 

1.01 

1.13 

9.63 

19.93 

Replication Sum 

9.70 

8.91 

10.25 

9.73 

9.71 

10.12 

9.74 

9.93 


78.09 


Degrees of Freedom Sum of Squares Mean Square 


Replications 

Treatments 

Error 


7 

7 

49 


0.1411 

0.7986 0.1141** 

1.0994 0.0224 


With three factors there are three main effects, L, P, and S; three two- 
factor interactions, SP, SL, and LP; and a three-factor interaction SLP. 
The comparisons representing the factorial effect totals are set out in 
table 1 2.9.2. The coefficients for the main effects and the two-factor inter- 
actions should present no difficulty, these being the same as in a 2 2 fac- 
torial. A useful rule in the 2" series is that the coefficients for any two- 
factor interaction like SP are the products of the corresponding coeffi- 
cients for the main effects S and P. 

The new term is the three-factor interaction SLP . From table 12.9.2 
the SP interaction (apart from its divisor) can be estimated at the higher 
level of L as 

10 20 - 9.14 - 10.30 + 9.63 = +0.39 
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TABLE 12.9.2 

Seven Comparisons in 2 3 Factorial Experiment on Pros 


Effects 


Lysme = 0 



Lysme 

= 0.6% 

Factorial 

Effect 

Total 

Sum of 
Squares 

P* 

12% 

P = 

14% 


12% 


14% 

M 

F 

M 

F 

M 

F 

M 

F 

8.47 

8.61 

11.03 

10.71 

10.20 

9.14 

10.30 

9.63 

Sex, S 

-1 

+ 1 

-1 

+ 1 

-1 

+ 1 

-1 

41 1 

-1.91 

0.0570 

Protein, P 

-1 


+ 1 

+ 1 

-1 

-1 

+ 1 

41 

5.25 

0.4307** 

SP 

+ 1 

-1 

-1 

+ 1 

+ 1 

-1 

-1 

41 

-0.07 

0.0001 

Lysme, L 

-1 

-1 

-1 

- 1 

+ 1 

+ 1 

+ 1 

41 

0.45 

0.0032 

SL 

+ 1 

-1 

+ 1 

-1 

-1 

4-1 

-1 

41 

-1.55 

0.0375 

PL 

+ 1 

+ 1 

-1 

-1 

-1 

-1 

4-1 

41 

-4.07 

0.2588** 

SPL 

-1 

+ 1 

4-1 

-1 

+ 1 

-1 

-1 

41 

0.85 

0.0113 

Total 

j 

i ^ 


0.7986 


An independent estimate at the lower level of L is 

8.47 - 8.61 - 11.03 + 10.71 = -0.46 

The sum of these two quantities, —0.07, is the factorial effect total for SP. 
Their difference, +0.39 — ( — 0.46) = +0.85, measures the effect of the 
level of L on the SP interaction. If we compute in the same way the effect 
of P on the SL interaction, or of S on the PL interaction, the quantity 
+0.85 is again obtained. It is called the factorial effect total for SLP. 
Such interactions are rather difficult to grasp. Fortunately, they are often 
negligible except in experiments that have large main effects. A significant 
three-factor interaction is a sign that the corresponding 3-way table of 
means must be examined in the interpretation of the results. 

As usual, the square of each factorial effect total is divided by n(2A 2 ), 
where n — 8 and £A 2 = 8, the denominator being 64 in every case. As a 
check, the total of the sums of squares for the factorial effects in table 
12.9.2 must add to the Treatments sum of squares in table 12.9.1, 0.7986. 

The only significant effects are the main effect of P and the PL inter- 
action. The totals for the P x L 2-way table are shown in the right hand 
column of table 12.9.1. With no added lysine, the higher level of protein 
gave a substantially greater daily gain than the lower level, but with 
added lysine, this gain was quite small. The result is not surprising, since 
soybean meal contains lysme. Lysine increased the rate of gain at the 
lower level of protein but decreased it at the higher level. 

In view of these results there is no interest in the main effects of P or of 
L. The experimenter has learned that gains can be increased either by a 
heavier addition of soybean meal or by the addition of lysine, whichever 
is more profitable : he should not add both. The absence of any interac- 
tions involving S gives some assurance that these results hold for both 
males and females. 
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The 2" factorial experiment has proved a potent research weapon i 
many fields. For further instruction on analysis, with examples, see (7 
(8), and (10). 

12.10 — Three-factor experiments; a 2 x 3 x 4 This section illustrate 
the general method of analysis for a three-factor experiment. The dal 
come from the experiment drawn on in the previous section. The factoi 
were Lysine (4 levels), Methionine (3 levels), and Soybean Meal (2 leve 
of protein), as food supplements to com in pig feeding. Only the males i 
two replications are used. This makes a 2 x 3 x 4 factorial arrangemei 
of treatments in a randomized blocks design. Table 12.10.1 contains tl 
data, with the computations for the analysis of variance given in detai 

1. First form the 'sums for each treatment and replication, and con 
pute the total S.S. and the S.S. for treatments, replications, and error (t 
subtraction). 

2. For each pair of factors, form a two-way table of sums. From tl 
L x M table (table A\ obtain the total S.S. (1 1 d.f) and the S.S. for L an 
M. The S.S. for the LM interactions is found by subtraction. The M x 
table supplies the S.S. for M (already obtained), for P, and for the M 
interactions (by subtraction). The Lx P table provides the S.S. for tl 
LP interactions. 

3. From the S.S. for treatments subtract the S.S . for L, M, P, LA 
MP, and LP to obtain that for the LMP three-factor interactions. 

The analysis of variance appears in table 12.10,2, and a furth 
examination of the results in examples 12,10.1 to 12.10.3. 

EXAMPLE 12.10.1 — In table 12.10.2, for L, Af, MP, and LMP the sums of squai 
are all so small that no single degree of freedom isolated from them could reach significant 
But LM and LP deserve further study. 

In the LM summary table A in table 12.10.1, there is some evidence of interact i< 
though the overall test on 6 degrees of freedom doesn’t detect it. Let us look at the line 
effects. First, calculate M L (—1,0, 4-1) for each level of lysine 

-0.08, -027, 0.57, 107 

Next, take the linear effect of lysine (—3, — 1, 4-1, 4- 3) m these M L ; the result, 4.29. Final 
application of Rule 12.8.2 yields the sum of squares 


L l M l 


(4.29) 2 
(4) (2) (20) 


= 0.1150, 


which is just short of significance at the 5% level None of the other 5 comparisons is si 
nificant In the larger experiment of which this is a part, L l M l was significant What i 
terpretation do you suggest? 

EXAMPLE 12 10.2 — In the LP summary table C, the differences between 14% and 12 
215, 2 07, 029, 0.56, 

suggest an interaction the beneficial effect of the higher level of protein decreases as m< 
lysine is added By applying the multipheis — 3, — 1 , 4- 1 , 4: 3, to the above figures, we c 
tain the L l P l effect total = -6 55 By Rule 12 8 2, 
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* / TABLE 12.10.1 

Three-Factor Experiment (2x3 x 4)tn Randomized Blocks. Average Daily 
Gains of Pigs Fed VariouSTerCentages of Supplementary Lysine, 
Methionine, and Protein 


Lysine, L 

Methionine, 

M 

Protein, P 

Replications 

(Blocks) 

Treatment 

Total 

1 

2 

0 

0 

12 

1.11 

0.97 

2.08 



14 

1.52 

1.45 

2.97 


0.025 

12 

1.09 

0.99 

2.08 



14 

1.27 

1.22 

2.49 


0.050 

12 

0.85 

1.21 

2.06 



14 

3.67 * 

1.24 

2.91 

0.05 

0 

12 

1.30 

1.00 

2.30 



14 

1.55 

1.53 

3.08 


0.025 

12 

1.03 

1.21 

2.24 



14 

1.24 

1.34 

2.58 


0.050 

12 

1.12 

0.96 

2.08 



14 

1.76 

1.27 

3.03 

0.10 

0 

12 

1.22 

3.13 

2.35 



14 

1.38 

1.08 

2.46 , 


0.025 

12 

1.34 

1.41 

2.75 



14 

1.40 

1.21 

2.61 


0.050 

12 

1.34 

1.19 

2.53 



14 

1.46 

1,39 

2.85 

0.15 

0 

12 

1.19 

1.03 

2.22 



14 

0.80 

1.29 

. 2.09 


0.025 

12 

1.36 

1.16 

V 2.52 



14 

1.42 

1.39 

2.81 


0.050 

12 

1.46 

1.03 

2.49 



14 

i 

*1.62 

1.27 

2.89 


Total 


31.50 28.97 60.47 


Computations: 

1. C = (60 47) 2 /48 = 76.1796 

2. Total: l.ll 2 + 0.97 2 + . . . + 1.62 2 + 1.27 2 - C = 2.0409 

3. Treatments: (2.08 2 + 2.97 2 + . . . + 2.89 2 )/2 - C = 1.2756 

4. Replications: (31.50 2 + 28.97 2 )/24 - C = 0.1334 

5. Error: 2.0409 - (1.2756 4- 0.1334) = 0.6319 



Summary Table A 


Methionine 

Lysine 

Total 

0 

0.05 

0.10 

0.15 

0 

5.05 

5.38 

4.81 

4.31 

19.55 

0.025 

4.57 

4.82 

5.36 

5.33 

20.08 

00)50 

4.97 

5.11 

5.38 

5.38 

20.84 

Total 

14.59 

15.31 

15.55 

15.02 

60.47 
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TABLE 12.10.1 — (Continued) 


Computations (continued) : 

6. Entries are sums of 2 levels of protein; 5.05 = 2.08 4 2.97, etc. 

7. Total in A : (5.05 2 4 ... 4 5.38 2 )/4 - C « 0.3496 
,8. Lysine, L: (14.59 2 4 ... 4 15.02 2 )/12 - C = 0.0427 

9. Methionine, M: (19.55 2 4 20.08 2 4 20.84 2 )/16 - C = 0.0526 . 

10. LM\ 0.3496 - (0.0427 4 0.0526) * 0.2543 


Summary Table B 


Methionine 

Protein 

Total 

12 

14 

0 

8.95 

10.60 

, 19.55 

0.025 

9.59 

10.49 

20.08 

0.050 

9.16 

11.68 

20.84 

Total 

27.70 

32.77 

60.47 


Computations (continued) : 

11. Entries are sums of 4 levels of lysine; 8.95 = 2.08 4 2.30 4 2.35 4* 2.22, etc. 

12. Total in B: (8.95 2 4 . . . 4 1 1.68 2 )/8 - C = 0.6702 

13. Protein, P : (27.70 2 + 32.77 2 )/24 - C = 0.5355 

14. MP: 0.6702 - (0.5355 4 0.0526) - 0.0821 


Summary Table C 



Lysine 

i 


Protein 

0 

0.05 

0.10 

0.15 

Total 

12 

6.22 

6.62 

7.63 

7.23 

27.70 

14 

8.37 

8.69 

7.92 

7.79 

32.77 

Total 

14.59 

15.31 

15.55 

15.02 

60.47 


Computations (continued) : 

15. Entries are sums of 3 levels of methionine; 6.22 = 2.08 4 2.08 4 2.06, etc. 

16. Total in C: (6.22 2 4 ... 4 7.79 2 )/6 - C= 0.8181 

17. LP : 0.8181 - (0.5355 4 0.0427) - 0.2399 

18. LMP: 1.2756 - (0.0427 4 0.0526 4 0.5355 4 0.2543 4 0.0821 4 0,2399) 
= 0.0685 


Ll? l — 


(6.5S) 2 

( 6 ) ( 2 ) ( 20 ) 


0.1788, 


F = 0.1788/0.0275 = 6.50, P = 0.025. This corresponds to the highly significant effect ob- 
served in table 12.9.2, where an interpretation was given. 

Deducting L l P l from the LP sum ofsquares in table 12 10.2,0.2399 — 0.1788 — 0.0611, 
shows that neither of the other two comparisons can be significant. 

EXAMPLE 12.10.3— The investigator is often interested in estimates of differences 
rather than in tests of significance. Because of the LP interaction he might wish to estimate 
the effect of protein with no lysine. Summary table C shows this mean difference’ 
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TABLE 12.10 2 

Analysis of v Variance of 3-Factor Pig Experiment 
Randomized Blocks Design 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Replications 

1 

0 1334 


Lysine, L(l = 4) 

3 

0.0427 

0.0142 

Methionine, M(m = 3) 

2 

0,0526 

0.0263 

Protein, P{p = 2) 

: i 

0.5355 

0.5355** 

LM 

6 

0.2543 

0.0424 

LP 

3 

0.2399 

0 0800 

MP 

2 

0.0821 

0.0410 

LMP 

6 

0.0685 

0.0114 

Error (r = 2) 

23 

0.6319 

0.0275 


(8.37 - 6 22)/6 ~ 0 36 lb /day (The justification for using all levels of methionine is that 
there is little evidence of eithe r main effect o r interaction with protem ) The standard error 
of the mean difference is ± > /(2)(0.0275)/6 = 0.096Ib./day Venfy that the 95% interval is 
from 0 16 to 0.56 lb /day 

12.11 — Expected valises of mean squares. In the analysis of variance 
of a factorial experiment, the expected values of the mean squares for 
main effects and interactions can be expressed in terms of components of 
variance that are part of the mathematical model underlying the analysis. 
These formulas have two principal uses. They show how to obtain un- 
biased estimates of error for the comparisons that are of interest. In 
studies of variability they provide estimates of the contributions made by 
different sources to the variance of a measurement. 

Consider a two-factor Ax B experiment in a completely randomized 
design, with a levels of A, b levels of B, and n replications. The observed 
value for the kt h replication of the zth level of A and the yth level of B is 

Kjk = #* + <* + £,+ («J S\j + (12.11.1) 

w here i = 1 . . a,j = 1 . . b k = 1 . . . n. (If the plan is in randomized 
blocks or a Latin square, further parameters are needed to specify block, 
row, or column effects ) 

The parameters a, and j3 p representing mam effects, may be fixed or 
random. If either A or B is random, the corresponding a, or are as- 
sumed drawn from an infinite population with mean zero, variance o A 2 
or <t b 2 . The (a/JXj are the two-factQr interaction effects. They are random 
if either A or B is random, with mean 0, variance a AB 2 . As usual, the e ljk 
have mean 0, variance a 2 . 

Before working out the expected value of the mean square for A, 
we must be clear about the meaning of main effects. The relevant and 
useful way of defining the main effect of A, and consequently the expected 
value of its mean square, depends on whether the other factor B is fixed or 
random 
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To illustrate the distinction, let A represent 2 fertilizers and B 2 
fields. Experimental errors e are assumed negligible, and results are as 
follows : 



Fertilizer j 

a t a 2 

*2 

1 

Field 

10 

17 I 

+ 7 

2 

18 

13 

-5 

Mean 

14 

15 

•+• 1 


'When B is fixed, our question is : What is the average difference between 
a 2 and a x over these two fields ? The answer is that a 2 is superior by 1 unit 
(15 — 14). The answer is exact, since experimental errors are negligible 
in this example. But if B is random, the question becomes: What can 
be inferred about the average difference between a 2 and a x over a popula- 
tion of fields of which these two fields are a random sample? The differ- 
ence (a 2 — a x ) is +7 in field 1 and — 5 in field 2, with mean 1 as before. 
The estimate is no longer ex act, but has a standard error (with 1 d.f.), 
which may be computed as sf{7 — (— 5)} 2 /4 — ± 6. Note that this stan- 
dard error is derived from the AB interaction, this interaction being, in 
fact, {7 - (— 5)}/2 = 6. 

To sum up, the numerical estimates of the main effects of A are the 
same whether B is fixed or random, but the population parameters being 
estimated are not the same, and hence different standard errors are re- 
quired in the two cases. 

From equation 12.11.1 the sample mean for the zth level of A is 

X... = n + a, + P + $?),. + e... (12.11.2) 


where ft = (/? t + ... + P b )/b, (a/?),. = {(a/Ou + ••• + (dP)ib}/b and «••• is 
the average of nb independent values of e. 

When B is fixed, the true main effects of A are the differences of the 
quantities {a, + (ajS),.} from level to level of A. In this case it is cus- 
tomary, for simplicity of notation, to redefine the parameter a, as a,' = a, 
+ (a/?),. Thus with B fixed, it follows from equation 12.11.2 that 

X t .. - X. . . = a/ - a' + e... - c. • . (12.1 1.3) 

J 

From this relation the expected value of the mean square for A is 
easily shown to be 


E(A) 


- X.. ) 
a — 1 


n/>E(a { ' — a') 2 
a — 1 


+ <r 2 (12.1 1.4) 
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The quantity £(a/ — <x’) 2 /(a — 1) is the quantity previously denoted 
by k/. 

If A is random and B is fixed, repeated sampling involves drawing a 
fresh set of a levels of the factor A in each experiment, retaining the same 
set of b levels of B. In finding E(A) we average first over samples that 
happen to give the same set of levels of A, this being a common device in 
statistical theory. Formula 12. 1 1 .4 holds at this stage. When we average 
further over all sets of a levels of A that can be drawn from the population, 
k a 2 is an unbiased estimate of c A , the population variance of a/.’ Hence, 
with A random and B fixed, 

, E(A) — nba A z + a 1 

■ t 

Now consider 'B random and revert to equation 12.11.2. 

= fx 4- oq 4- P 4- (oc/?) f . -h e f .. (12.11.2) 

In each new sample we draw fresh values of ftj and of (a/J) l7 so that /? and 
(a p)i- change from sample to sample. Since, however, the population 
means of /?, (ajSV and are all zero, the population mean of X t .As 
fx 4- a Consequently, the population variance of the main effects of A 
is defined as k a 2 ~ i(a f 5) 2 /(a — 1) if A is fixed, or as the variance 
<r A 2 of the a’s if A is random. But since 

X- t .. ~~ X... = <*i — a 4- (a/5),-.-* (a/?).. 4- 

the expected value of the mean square of A now involves a AB 2 as well as a 2 . 
It follows that when B is random, 

E(A) = nbK A + na AB 2 -her 2 (A fixed) 

E(A) = nba A 2 -f na AB 2 -f a 2 (A random) 

The preceding results are particular cases of a more general formula. 
If the population of levels of B is finite, containing W levels of which b 
are chosen at random for the experiment, 

E(A) = nba A 2 + n(^-^Xj<x AB 2 + a 2 ■ 

This case occurs, for instance, if a combine of B' factories or cotton 
growers carries out experiments in a random sample of b factories or 
fields. If b = B' the term in a AB 2 vanishes and we regard factor B as fixed. 
As B' tends to infinity, the coefficient of a AB 2 tends to w* factor B being 
randoffi. If A is fixed, a 2 becomes k a . 

The,J42? mean square is derived from the sum of squares of the terms 
(Xiy — X v . — X.j. 4- X . ..). From the model, this term is 

(cc^)ij — 4- (&/?)•• 4- &ij . e t -.. — e.j. 4- £... 
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Unless both A and B are fixed, the interaction term in the above is a 
random variable from sample to sample, giving 

E(AB) = nc AB A a 2 

With both factors fixed, a AB 2 is replaced by k ab 2 . Table 12.11.1 sum- 
marizes this series of results. 


TABLE 12.1 1.1 

Expected Values of Mean Squares in a Two-factor Experiment 
Expected Value = Parameters Estimated 


Mean 

Squares 

Fixed Effects 

ft 

Random Effects 

Mixed Model 

A Fixed, B Random 

A 

<t 2 -f nbx A 2 

<7 2 + m AB 2 + nb<r A 2 

or 2 -f na AB 2 + nbK A 2 

B 

<r 2 -f naic B 2 

a 2 + n<r AB 2 + naa B 2 

a 2 + naa B 2 

AB . 

<t 2 + nK AB 2 

o' 2 + na AB 2 

a 2 4- na AB 

Error 


a 2 

<7 2 


Note that when B is random and the main effects of A are 0 ( k a 2 or 
g a = 0), the mean square for A is an unbiased estimate of a 2 4- na AB 2 . 
It follows that the appropriate denominator or “error” for an F-test of 
the main effects of A is the AB Interactions mean square, as illustrated 
from our sample of two fields. When B is fixed, the appropriate de- 
nominator is the Error mean square in table 12. ILL 

General rules are available for factors A, B, C, D, . . . at levels 
a, b, c, d, . . . with n replications of each treatment combination. Any 
factors may be fixed or random. In presenting these rules, the symbol U 
denotes the factorial effect in whose mean square we are interested (for 
instance, the main effect of A, or the BC interaction, or the ACD inter- 
action). 

Rule. 12. ILL The expected value of the mean square for U contains 
a term in a 2 and a term in o v 2 . It also contains a variance term for any 
interaction in which (i) all the letters in U appear, and (ii) all the other letters 
in the interaction represent random effects. 

Rule 12.11.2. The coefficient of the term in a 2 is 1. The coefficient 
of any other variance is n times the product of all letters a, b y c, . . . that 
do not appear in the set of capital letters A, B , C, . . . specifying the 
variance. 

For example, consider the mean square for Cina three-way factorial. 
If A and B are both random, 

E(C) = g 2 + n<r ABC 2 + nbo AC 2 + nao BC 2 + nabc j c 2 

If A is fixed but B. is random, the terms in <J ABC 2 and o AC 2 drop out by 
Rule 12.11.1, and we have 

E(C) = a 2 + nao BC 2 + naba c 2 
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If A and B are both fixed, the expected value is 

E(C) = a 2 + nabcr c 2 

For mam effects and interactions in which all factors are fixed, we have 
followed the practice of replacing a 2 by k 2 . Most writers use the symbol 
a 1 in either case. Table 12.1 1.2 illustrates the rules for three factors. 


TABLE 12.11.2 

Expected Values of Mean Squares in a Three-Way Factorial 



Expected Values 

Mean Squares 

All Effects Fixed 

All Effects Random 

A 

c 2 4 nbcK A 2 

<r 2 + ™abc 2 + ncd AB 2 4 nbd AC 2 4 nbta A 2 

B 

a 2 4 nauc B 2 

d 2 4 nd ABC 1 4 ncd AB 4 nad BC 2 4 naca B 2 

C 

d 2 + nabK c 2 

d 2 4 nd ABC 2 4 nbd AC 2 4 nad BC 2 4 nabd c 2 

AB 

d 1 4 ncK iB 2 

d 2 4 nd ABC 2 4 ncd AB 2 

AC 

d 2 4 nbK AL 2 

d 2 4 nd ABC 2 4 nbd AC 2 

BC 

d 2 4 naK BC 2 

d 2 4 nd ABC 2 4 nad BC z 

ABC 

a 1 + nK ABC 2 

a 2 + rw ABC 2 

Error 

d 2 

d 1 

Mean Squares 

A Fixed, B and C Random 

A 

d 2 4 na ABC 2 4 ncd AB 4 nba AC 2 4 nbcK 2 

B 

d 2 4 naa B Q + nacd B 

, c 

d 2 4* nad BC 2 + nabd c 2 

AB 

d 2 4- na ABC 2 4* ncd AB 2 

AC 

d 1 4 na A BC 2 4 nbd AC 2 

BC 

d 2 4 ncid BC 2 


ABC 

<T 2 + "O abc 2 


Error 

d 2 



From these formulas, unbiased estimates of all the components of 
variance can be obtained as linear combinations of the mean squares in 
the analysis of variance. The null hypothesis that any component is 0 
can be tested, though complications may arise. Consider the null hy- 
pothesis tr c 2 = 0. Table 12.11.2 shows that if all effects are fixed, the 
appropriate denominator for testing the mean square for C is the ordi- 
nary Error mean square of the experiment. If A is fixed and B is random, 
the appropriate denominator is the BC mean square. 

If all effects are random, no single mean square in the analysis of 
variance is an appropriate denominator for testing cr c 2 (check with table 
12.11.2). An approximate F-test is obtained as follows (11, 12). If 
a c l = 0, >ou may verify from table 12.1 1.2 that 

E(C) = E(AC) + E(BC) - E(ABC) 
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while if a c 2 is large, E(C ) will exceed-the right-hand side. A test criterion is 
F' = {(C) + (ABC)}/{(AC) +\BC)} 

-3 

where (C) denotes the mean square for C, and so on. The approximate 
degrees of freedom are 

__ {(C) + (ABC)} 2 

1 (Q 2 (ABC) 2 

fc Iabc 

n 

2 {AC)2 { ( BC )* 

Ac Ac 

12.12 — The split-plot or nested design. It is often desirable to get pre- 
cise information on one factor and on the interaction of this factor with a 
second, but to forego such precision on the second factor. For example, 
three sources of vitamin might be compared by trying them on three males 
of the same litter, replicating the experiment on 20 litters. This would be a 
randomized blocks design with high precision, providing 38 degrees of 
freedom for error. Superimposed on this could be some experiment with 
the litters as units. Four types of housing could be tried, one litter to each 
type, thus allowing 5 replications with 12 degrees of freedom for error. 
The main treatments (housings) would not be compared as accurately as 
the sub-treatments (sources of vitamin) for two reasons; less replication 
is provided, and litter differences are included in the error for evaluating 
the housing effects. Nevertheless, some information about housing may 
be got at little extra expense, and any interaction between housing and 
vitamin will be accurately evaluated. 

In experiments on varieties or fertilizers on small plots, cultural prac- 
tices with large machines may be tried on whole groups of the smaller 
plots, each group containing all the varieties. (Irrigation is one practice 
that demands large areas per treatment.) The series of cultural practices 
is usually replicated only a small number of times but the varieties are 
repeated on every cultural plot. Experiments of this type are called 
split-plot , the cultural main plot being split into smaller varietal sub-plots . 

This design is also common in industrial research. Comparisons 
among relatively large machines, or comparisons of different conditions 
of temperature and humidity under which machines work, are main plot 
treatments, while adjustments internal to the machines are sub-plot treat- 
ments. Since the word plot is inappropriate in such applications, the 
designs are often called nested , in the sense of section 10.16. 

Thfc essential feature of the split-plot experiment is that the sub-plot 
treatments are not randomized over the whole large block but only over 
the main plots. Randomization of the sub-treatments is newly done in 
each main plot and the main treatments are randomized in the large blocks. 
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Fig. 12.12.1 — First 2 blocks of split-plot experiment on alfalfa, illustrating random 
arrangement of main and sub-plots. 

A consequence is that the experimental error for sub-treatments is dif- 
ferent (characteristically smaller) than that for main treatments. 

Figure 12.12.1 shows the field layout of a split-plot design with three 
varieties of alfalfa Jhe sub-treatments being four dates of final cutting (1 3). 
The first two harvests were common to all plots, the second on July 27, 
1 943. The third harvests were : A , none ; B , September 1 ; C, September 20 ; 
D, October 7. Yields in 1944 are recorded in table 12.12.1. Such an ex- 
periment is, of couise, not evaluated by a single season’s yields; statistical 
methods for perennial crops are discussed in section 12.14. 

In the analysis of variance the main plot analysis is that of random- 
ized blocks with three varieties replicated in six blocks. The sub-plot 
analysis contains the sums of squares for dates of cutting, for the date x va- 
riety interactions, and for the sub-plot error, found by subtraction as 
shown at the foot of table 12.12.2. 

The significant differences among dates of cutting were not unex- 
pected, nor were the smaller yields following B and C. The last harvest 
should be either early enough to allow renewed growth and restoration of 
the consequent depletion of root reserves, or so late that no growth and 
depletion will ensue. The surprising features of the experiment were two ; 
the yield following C being greater than B , since late September is usually 
considered a poor time to cut alfalfa in Iowa; and the absence of inter- 
action between date and variety — Ladak is slow to renew growth after 
cutting and might have reacted differently from the other varieties. 

In order to justify this analysis we need to study the model. In 
randomized blocks, the model for the split-plot or nested experiment is 

x t Jk = M + Af, + Bj + £ tJ -F T k + ( MT) lk + S ljk 
i — 1 m, j == 1 . h, k = 1 . . t, 8 tJ = A (0, c?m), S ljk — ,4" (0, <7 f ) 

Here, M stands for mam plot treatments, B for blocks, and T for sub-plot 
treatments. 
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TABLE 12.12.1 

Yields of Three Varieties of Alfalfa (Tons Per Acre) in 1944 Following 
Four Dates of Final Cutting in 1943 


Variety 

Date 

1 

2 

Blocks 

3 4 


5 

6 

Ladak 

! 

A 

2.17 

1.88 

1.62 

2.34 

1.58 

1.66 



B 

1.58 

1.26 

1.22 

1.59 

1.25 

0.94 



C 

2.29 

1.60 

1.67 

1.91 

1.39 

1.12 



D 

2.23 

2.01 

1.82 

2.10 

1.66 

1.10 




8.27 

6.75 

6,33 

7.94 

5.88 

4.82 

Cossack 


A 

2.33 

2.01 

1,70 

1,78 

1.42 

1.35 



B 

1.38 

1.30 

1.85 

1.09 

1.13 

1.06 



C 

1.86 

1.70 

1.81 

1.54 

1.67 

0.88 



D 

2.27 

1.81 

2.01 

1.40 

1.31 

1.06 




7.84 

6.82 

7.37 

5.81 

5.53 

4.35 

Ranger 


A 

1.75 

1.95 

2.13 

1.78 

1.31 

1.30 



B 

1.52 

1.47 

1.80 

1.37 

1.01 

1.31 



C 

1.55 

1.61 

1.82 

1.56 

1.23 

1.13 



D 

1.56 

1.72 

1.99 

1.55 

1.51 

1.33 




6.38 

6.75 

7.74 

6.26 

5.06 

5.07 

Total 

22.49 

20.32 

21.44 

20.01 

16.47 

14.24 


1 


Date of Cutting 





Variety 


A 

B 


C 

D 


Total 

Ladak 


11,25 

7.84 

9.98 

10.92 


39.99 

Cossack 


10.59 

7.81 

9.46 

9.86 


37.72 

Ranger 


10.22 

8.48 

8.90 

9.66 


37.26 

Total 

32.06 

24.13 

28.34 

30.44 


114.97 

Mean (tons per acre) 

1.78 

1.34 


1.57 

1.69 




The symbols ij identify the main plot, while k identifies the sub-plot 
within the main plot. The two components of error, and S f jk , are needed 
to make the model realistic: the sub-plots in one main plot often yield 
consistently higher than those in another, and e tj represents this difference. 
From the model, the error of the mean difference between two main 
plot treatments, say M x and M 2 , is 

^1* — ^2* "b ^1** ^2'' 

The e’s are averages over b values, the S ' s over bt values. Consequently, 
the variance of the mean difference is 
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TABLE 12.12.2 

Analysis of Variance of Split-Plot Experiment on Alfalfa 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Main plots: 




Varieties 

2 

0.1781 

0.0890 

Blocks 

5 

4.1499 

0.8300 

Main plot error 

1° 

1.3622 

0.1362 

Sub-plots: 




Dates of cutting 

3 

1.9625 

0.6542** 

Date x vanety 

6 

0.2105 

0.0351 

Sub-plot error 

45 

1.2586 

0.0280 


L Correction: C= (114.97) 2 /72 = 183.5847 
2. Total: (2.17) 2 4- ... 4- (1.33) 2 - C « 9.1218 
(8.27) 2 4- ... 4- (5.07) 2 


3. Main plots :- 


-C= 5.6902 


„ Tr (39.99) 2 4- ... 4- (37.26) 2 „ n 

4. Varieties : C = 0.1781 

24 

5. Blocks: - C - 4.1499 

12 

6. Main plot error: 5.6902 - (0.1781 4- 4.1499) = 1.3622 

(11. 25) 2 4- ... 4- (9.66) 2 

7. Sub-classes in variety-date table: C — 2.3511 

6 

(32.06) 2 4- ... 4- (30.44) 2 

8. Dates: 9 - C - 1.9625 

18 

9. Date x vanety: 2.3511 - (0.1781 + 1.9625) = 0.2105 

10. Sub-plot error: 9.1218 - (5.6902 + 1.9625 + 0.2105) = 1.2586 


2 (x + ^)-b‘* , + *^' 

In the analysis of variance, the main plot Error mean square estimates 

(ctj 2 4- ta M 2 ). 

Consider now the difference ~ Xu 2 between two sub-plots that 
are in the same main plot. According to the model, 

- X >i2 = T, - T 2 + (MDn - (. MT) a + S tJl - S lj2 

The error now involves only the <5’s. Consequently, for any comparison 
among treatments that is made entirely within main plots, the basic error 
variance is a l 2 J estimated by the sub-plot Error mean square. Such com- 
parisons include (i) the main effects of sub-plot treatments, (ii) interac- 
tions between mam-plot and sub-plot treatments, and (iii) comparisons 
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between sub-plot treatments for a single main-plot treatment (e.g., be- 
tween dates for Ladak). 

In some experiments it is feasible to use either the split-plot design 
or ordinary randomized blocks in which the mt treatment combinations 
are randomized within each block. On the average, the two arrangements 
have the same overall accuracy. Relative to randomized blocks, the split- 
plot design gives reduced accuracy on the main-plot treatments and in- 
creased accuracy on sub-plot treatments and interactions. In some in- 
dustrial experiments conducted as split-plots, the investigator apparently 
did not realize the implications of the split-plot arrangement and analyzed 
the design as if it were in randomized blocks. The consequences were to 
assign too low errors to main-plot treatments and too high errors to sub- 
plot treatments. 


TABLE 12.12.3 

Presentation of Treatment Means (Tons Per Acre) and Standard Errors 


Variety 

Date of Cutting ( ± fEjb = ± 0.06S3) 

A B C D 


Means 

Ladak 

1.875 

1.307 

1.664 

1.820 

1.667 

(±y]Ejtb~ 

Cossack 

1.765 

1.302 

1.577 

1.644 

1.572 

±0.0753) 

Ranger 

1.704 

1.414 

1.484 

1.610 

1.553 

Means 

1.781 

1.341 

( ± fEJmb = 

1.575 

±0.0394) 

1.691 

l 




Care is required in the use of the correct standard errors for com- 
parisons among treatment means. Table 12.12.3 shows the treatment 
means and s.e's for the alfalfa experiment, where £ a = 0.1362 and 
E b = 0.0280 denote the main- and sub-plot Error mean squares. 
The s.e . ± 0.0683, which is derived from E b , is the basis for computing the 
s.e. for comparisons that are part of the Variety-Date interactions and for 
comparisons among dates for a single variety or a group of the varieties. 
The s.e. ±0.0753 for varietal means is derived from E a . Some compari- 
sons, for example those among varieties for Date A, require a standard 
error that involves both E a and E„, as described in (8). 

Formally, the sub-plot error S.S. (45 d.f.) is the combined S.S. for the 
BT interactions (15 d.f.) and the BMT interactions (30 df). Often, it is 
more realistic to regard Blocks as a random component rather than as a 
fixed component. In this case, the error for testing T is the BT mean 
square, while that for testing MT is the BMT mean square, if the two mean 
squares appear to differ. 

Experimenters sometimes split the sub-plots and even the sub-sub- 
plots. The statistical methods are a natural extension of those given here. 
If 7i, T 2 , T 3 denote the sets of treatments at three levels, the set T x are 
tested against the main-plot Error mean square, T 2 and the T x T 2 interac- 
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EXAMPLE 12.12.2 — Attention is attracted to the two significant interactions, /Sand 
IF. Now, ISF is less than error. This means that the IS interaction is much the same at all 
levels of F; or, alternatively, that the //’interaction is similar at all levels of S. Hence, each 
2-way table gives information. 



Fi 

Fi 

7, 

S, 


S 3 

Not Irrigated 

1,047 

1,058 

1,099 

1,064 

1,134 

1,006 

Imgated 

1,183 

1,356 

1,437 

1,162 

1,353 

1,461 


Neither fertilizer nor stand affected yield materially on the non-irrigated plots. With 
irrigation, the effect of each was pronounced. So it is necessary to examine separately the 
split-plot experiment on the irrigated plots. Verify the following mean squares : 


Stand: 


Linear 

1 

3,725** 

Deviations 

1 

96 

Error (a) 

6 

316 

Fertilizer: 

Linear 

1 

2,688** 

Deviations 

1 

118 

SF 

4 

92 

Error ( b ) 

18 

137 


EXAMPLE 12.12.3 — Notice that the planting and fertilizer rates were well chosen for 
the unirrigated plots, but on the irrigated plots they were too low to allow any evaluation 
of the optima. This suggests that irrigation should not be a factor in such experiments. 
But in order to compare costs and returns over a number of years, two experiments (one with 
and one without irrigation) should be randomly interplanted to control fertility differences. 


12.13 — Series of experiments. A series of experiments may extend 
over several places or over several years or both. In a number of coun- 
tries in which the supply of food is deficient, such series have been under- 
taken in recent years on farmers’ fields in order to estimate the amount 
by which the production of food grains can be increased by greater use of 
fertilizers. 

Every series of experiments presents a unique problem for the ex- 
perimenter and the statistician, both in planning and analysis. Good 
presentations of the difficulties involved are in (15, 16, 17, 18), with illus- 
trations of the analysis. The methods given in this book should enable 
the reader to follow the references cited. Only a brief introduction to the 
analysis for experiments conducted at a number of places will be given 
here. 

We suppose that the experiments are all of the same size and structure, 
and that the places can be regarded as a random sample of the region about 
which inferences are to be made. For many reasons, a strictly random 
sample of places is difficult to achieve in practice: insofar as the sample 
is unrepresentative, inferences drawn from the analysis are vulnerable to 
bias. 

In the simplest case, the important terms in a combined analysis of 
variance are : 
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Treatments 
Treatments x Places 
Pooled experimental errors 

The Treatments x Places mean square is tested against the pooled error 
(average of the Error mean squares in the individual experiments). If F 
is materially greater than 1, indicating that treatment effects change from 
place to place, the Treatments mean square is tested against the Treat- 
ments x Places mean square, which becomes the basic error term for 
drawing conclusions about the average effects of treatments over the 
region. 

Two complications occur. The experimental error variances often 
differ from place to place. This can be checked by Bartlett’s test for 
homogeneity of variance. If variances are heterogeneous, the F-test of 
the Treatments x Places interactions is not strictly valid, but an adjusted 
form of the test serves as an adequate approximation (15, 17). If com- 
parisons are being made over a subset of the places, as suggested later, 
the pooled error for these places should be used instead of the overall 
pooled error. v 

Secondly, the Treatments x Places interactions may not be homo- 
geneous, especially in a factorial experiment. Some factors may give 
stable responses from place to place, while others are more erratic in their 
performance. If the Treatments mean square has been subdivided into 
sets of comparisons, the Interactions mean square for each set should be 
computed and tested separately. 

The preceding approach is appropriate where the objective is to 
reach a single set of conclusions that apply to the whole region. Some- 
times there is reason to expect that the relative performances of the treat- 
ments will vary with the soil type, with climatic conditions within the 
region, or with other characteristics of the places. The series may have 
been planned so as to examine such differences, leading perhaps to dif- 
ferent recommendations for different parts of the region. In the analysis, 
the places then subdivide into a number of sets. The Treatments x Places 
interactions are separated into 

Treatments x Sets 
Treatments x Places within sets 

If the Treatments x Sets mean square is substantially larger than Treat- 
ments x Places within sets, it is usually advisable to examine the results 
separately for each set. 

The following examples illustrate the preliminary steps in the analy- 
sis of one series of experiments. 

EXAMPLE 12 13.1 — The following data illustrate a senes of expenments over five 
places (2 1 ) Four treated lots of 100 M ukden soybean seeds, together with one lot untreated, 
were planted in 5 randomized blocks at each participating station The total numbers of 
emerging plants (from 500 seeds) are shown for the 5 locations Also shown are the analyses 
of vanance at the several stations 



377 


Number of Emerging Plants (50Q Seeds) in Five Plots Cooperative Seed 
Treatment Trials With Mukden Soybeans, 1943 


Location 

Untreated 

Arasan 

Spergon 

Semesan, Jr. 

Fermate 

Total 

Michigan 

360 

356 

362 

350 

373 

1,801 

Minnesota 

302 

354 

349 

332 

332 

1,669 

Wisconsin 

408 

407 

391 

391 

409 

2,006 

Virginia 

244 

267 

293 

235 

278 

1,317 

Rhode Island 

373 

387 

406 

394 

375 

1,935 

Total 

1,687 

1,771 

1,801 

1,702 

1,767 

8,7z,» 


Mean Squares From Original Analyses of Variance 


Source of 
Variation 

Degrees of 
Freedom 

Location 

Michigan 

Minnesota 

Wisconsin Virginia 

Rhode Island 

Treatments 

4 

14.44 

82.84* 

17.44 114.26* 

37.50 

Blocks 

4 

185.14 

54.64 

5.64 70.76 

4.80 

Error 

16 

42.29 

26.67 

30.64 26 34 

13.05 

Test the hypothesis of homogeneity of error variance, Ans. Corrected x 2 = 

5.22, d.f » 4. 

EXAMPLE 12.13.2 — For the entire soybean data, analyze the variance as follows: 

Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Treatments 


A 


380.29 

95 07 

Locations 


A 


11,852.61 

2,963.15 

Interaction 


16 

685.63 

42.85 

Blocks m Locations 

20 

1,283.92 


Experimental Error 

80 

2,223 68 

27.80 


Blocks and Experimental Error are pooled values from the analyses of the five places 


EXAMPLE 12 13 3 — Isolate the sum of squares for the planned comparison. Un- 
treated vs Average of the four Treatments Ans. 1 7 1 .70, F — 4.0 1 , F 05 — 4.49 

12.14 — Experiments with perennial crops. When a perennial crop is 
investigated over a number of years, the yields from the same plot in suc- 
cessive years are usually correlated: The experimental error in one season 
is not independent of that in another season. 

In comparing the overall yields of the treatments, this difficulty is 
overcome by first finding/or each plot the total yield over all years. These 
totals are analyzed by the method appropriate to the design that was used. 
This method provides a valid error for testing the overall treatment effects. 

For illustration, the data in table 12.14.1 are taken from an experi- 
ment by Haber (19) to compare the effects of various cutting treatments on 
asparagus. Planting was m 1927 and cutting began m 1929. One plot 
m each block was cut until June 1 m each year, others to June 15, July 1, 
and July 15. The yields are for the four succeeding years, 1930, 1931, 
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1932, and 1933. The yields are the weights cut to June 1 in every plot, 
irrespective of later cuttings in some of them. This weight is a measure 
of vigor, and the objective is to compare the relative effectiveness of the 
different harvesting plans. 

A glance at the four-year totals (5,706; 5,166; 4,653; 3,075) leaves 
little doubt that prolonged cutting decreased the vigor. The cutting totals 
were separated into linear, quadratic, and cubic components of the regres- 

TABLE 12.14.1 

Weight (Ounces) of Asparagus Cut Before June 1 From Plots With 
Various Cutting Treatments 


Blocks 

Year 

June 1 

Cutting Ceased 
June 15 July 1 

July 15 

Total 

1 

1930 

230 

212 

183 

148 

773 


1931 

324 

415 

320 

246 

1,305 


1932 

512 

584 

456 

304 

1,856 


1933 

399 

386 

255 

144 

1,184 



1,465 

1,597 

1,214 

842 

5,118 

2 

1930 

216 

190 

186 

126 

718 


1931 

317 

296 

295 

201 

1,109 


1932 

448 

471 

387 

289 

1,595 


1933 

361 

280 

187 

83 

911 



1,342 

1,237 

1,055 

699 

4,333 

3 

1930 

219 

151 

177 

107 

654 


1931 

357 

278 

298 

192 

1,125 


1932 

496 

399 

427 

271 

1,593 


1933 

344 

254 

239 

90 

927 



1,416 

1,082 

1,141 

660 

4,299 

4 

1930 

200 

150 

209 

168 

727 


1931 

362 

336 

328 

226 

1,252 


1932 

540 

485 

462 

312 

1,799 


1933 

381 

279 

244 

168 

1,072 



1,483 

1,250 

1,243 

874 

4,850 

Total 


5,706 

5,166 

4,653 

3,075 

18,600 


Degrees of Freedom Sum of Squares Mean Square 


Blocks 

Cuttings 

Linear 

Quadratic 

Cubic 

Error 


3 

(3) 

1 

1 

1 

9 


30,170 

(241,377) 

220,815** 
16,835* 
3,727 
2,429 - 
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sion on duration of cutting. The significant quadratic component indi- 
cates that the yields fall off more and more rapidly as the severity of 
cutting increases. 

Such experiments also contain information about the constancy of 
treatment differences from year to year, as indicated by the Treatments x 
Years interactions. Often it is useful to compute on each plot the linear 
regression of yield on years, multiplying the yields in the four years by 
— 3, — 1 , + 1 , +3 and adding. These linear regressions (with an appro- 
priate divisor) measure the average rate of improvement of yield from 
year to year. An analysis of the linear regressions for the asparagus data 
appears in table 12.14.2. From the totals for each treatment it is evident 
that the improvement in yield per year is greatest for the June 1 cutting, 
and declines steadily with increased severity of cutting, the July 15 cutting 
showing only a modest total, 119. 


TABLE 12.14.2 

Analysis of the Linear Regression of Yield on Years 


Blocks 

June 1 

Cutting Ceased 

June 15 July 1 

July 15 

Total 

i 

695* 

691 

352 

46 

1,784 

2 

566 

445 

95 

-41 

1,065 

3 

514 

430 

315 

28 

1,287 

4 

721 

536 

239 

86 

1,582 

Total 

2,496 

2,102 

1,001 

119 

5,718 


Degrees of Freedom 

Sum of Squares Mean Square 

Blocks 


3 

3,776 



Cuttings . 


(3) 

43,633 

14,544** 

Linear 


I 

42,354** 



Quadratic 


1 

744 



Cubic 


1 

536 



Error 


9 

2,236 


248 


* 695 = 3(399) + 512 - 324 - 3(230), from table 12 14.1 


In the analysis of variance of these linear regression terms, the sum 
of squares between cuttings has been subdivided into its linear, quadratic, 
and cubic regression on duration. Only the linear term was strongly 
significant. Evidently, each additional two weeks of cutting produced 
about the same decrease in the annual rate of improvement of yield. 

In this analysis of variance an extra divisor 20 = 3 2 4- l 2 + l 2 + 3 2 
was applied to each sum of squares, in order that the mean squares refer 
to a single observation. Can you explain why the Error mean square, 248, 
is so much smaller than the Error mean square for the four-year totals, 
2,429? Features of this experiment have been discussed by Snedecor 
and Haber (19, 20). 
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]\^Iultiple regression 


13.1 — Introduction. The regression of Y on a single independent vari- 
able (chapter 6) is often inadequate. Two or more JTs may be available 
to give additional information about Y by means of a multiple regression 
on the JTs. Among the principal uses of multiple regression are : 

(1) Constructing an equation in the JTs that gives the best prediction 
of the values of 7. 

(2) When there are many JTs, finding the subset that gives the best 
linear prediction equation. In predicting future weather conditions at an 
airport, there may be as many as 50 available X-variables, which measure 
different aspects of the present weather pattern at neighboring weather 
stations. A prediction equation with 50 variables is unwieldy, and is un- 
wise if many of the Z-variables contribute nothing to improved accuracy 
in the prediction. An equation based on the best three or four variables 
might be a wise choice. 

(3) In some studies the objective is not prediction, but instead to 
discover which variables are related to Y, and, if possible, to rate the 
variables in order of their importance. 

Multiple regression is a complex subject. The calculations become 
lengthy when there are numerous 7-variables, and it is hard to avoid mis- 
takes in computation. Standard electronic computer programs, now be- 
coming more readily available, are a major help. Equally important is 
an understanding of what a multiple regression equation means and what 
it does not mean. Fortunately, much can be learned about the basis of 
the computations and the pitfalls in interpretation by study of a regression 
on two jf-variables, which will be considered in succeeding sections before 
proceeding to three or more X-variables. 

13.2 — Two independent variables. With only one 7-variable, the 
sample values of Y and 7 could be plotted as in figures 6.2.1 and 6.4.1, 
which show both the regression line and the distributions of the individual 
values of Y about the line. But if Y depends partly on X x and partly on 

381 



382 Chapter 13: Multiple Regression 

X 7 for its value, solid geometry instead of plane is required. Any observa- 
tion now involves three numbers — the values of Y, X l9 and X 2 . The pair 
(X u X 2 ) can be represented by a point on graph paper. The values of Y 
corresponding to this point are on a vertical axis perpendicular to the 
graph paper. In the population these values of Y form a frequency dis- 
tribution, so we must try to envisage a frequency distribution of Y on 
each vertical axis. Each frequency distribution has a mean — the mean 
value of Y for specified X U X 2 . The surface determined by these means is 
the regression surface. In this chapter the surface is a plane , since only 
linear regressions on X x and X 2 are being studied. 

The population regression plane is written 

!'*=« + PlXl + P 2 X 2 > 

where Y R denotes the mean value of the frequency distribution of Y for 
specified X i9 X 2 . In mathematical notation, Y R = E(Y\X l9 X 2 ). 

What does jS x measure? Suppose that the value of X x increases by 1 
unit, while the value of X 2 remains unchanged. Y R becomes 

Y*' = a + P l X l + A + P 2 X 2 = Y r - Y p, 

Thus, P x measures the average or expected change in Y when X x increases 
by 1 unit , X 2 remaining unchanged . For this reason p x is called the partial 
regression coefficient of Y on X x . Some writers use a more explanatory 
symbol p Yl i f° r Pu the subscript -2 being a reminder that X 2 also ap- 
pears in the regression equation. 

For given X l9 X 2 , the individual values of Y vary about the regression 
plane in a normal distribution with mean 0 and variance cr 2 , sometimes 
denoted by o Y . 12 2 . Hence, the model is 

Y = a + P x X x + p 2 X 2 +e, e= Jf( 0, a) (13.2.1) 

Given a sample of n values of ( 7, X i9 X 2 ) the sample regression — the pre- 
diction equation — is 

Y=a + b x X t + b 2 X 2 (13.2.2) 

The values of a, b u and b 2 are chosen so as to minimize Z( Y — t) 2 , the 
sum of squares of the n differences between the actual and the predicted 
Y values. With our model, theory show's that the resulting estimates a , 
b l9 b l9 and f are unbiased and have the smallest standard errors of any 
unbiased estimates that are linear expressions in the Y’s. The value of 
a is given by the equation 

a — 7 - b x X x - b 2 X 2 (13.2.3) 

By substitution for a in (13.2.2) the fitted regression can be written 

7=7 4 - b l x l + b 2 x 2 , 

where x x == X x - X u as usual. 


(13.2.4) 
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The b " s satisfy the normal equations : 

biExj 2 4- b 2 'Lx 1 x 2 = ^x x y (13.2.5) 

b x Lx l x 2 4- b 2 Zx 2 2 = Ix 2 y (13.2.6) 

Solution of these equations by standard algebraic methods leads to the 
formulas : 


and 


where 


= ( Zx 2 2 )(Sxij;) - (Zx t x 2 )(£x 2 y) ^ ^ 

(£x t 2 )(i:x 2 y) - (Xxtx^^xiy) (1328) 

D = (Xxj 2 )(2:x 2 2 ) - (2xix 2 ) 2 (13.2.9) 


The illustration (table 13.2.1) is taken from an investigation (1) of 
the source from which corn plants in various Iowa soils obtain their 
phosphorus. The concentrations of inorganic (*i) and organic (X 2 ) 
phosphorus in the soils were determined chemically. The phosphorus 
content Y of com grown in the soils was also measured. 

The familiar calculations under the table give the sample means and 
the sums of squares and products of deviations from the means. Substi- 
tution in (13.2.7) to (13.2.9) gives 


D = (1,752.96) (3, 1 55.78) - (1.085.61) 2 = 4,353,400 
b = (3,155.78)(3,231 .48) - (1,085.61)(2,216.44) _ J jm 


br = 


4,353,400 

(1,752.96)(2,216.44) - (1,085.61)(3, 231.48) 
4,353,400 


= 0.0866 


From (13.2.3), a is given by 

a = 81.28 - (1.7898)(1 1.94) - (0.0866) (42.11) = 56.26 


The multiple regression equation becomes 

? = 56.26 + 1.7898*! + 0.0866* 2 (13.2.10) 


The meaning is this : For each additional part per million of inorganic 
phosphorus in the soil at the beginning of the growing season, tbe phos- 
phorus in the corn increased by 1.7898 ppm, as against 0.0866 ppm for 
each additional ppm of organic phosphorus. The suggestion is that the 
inorganic phosphorus in the soil was the chief source of plant-available 
phosphorus. This deduction needs further consideration (sections 13.3 I 
and 13.5). 
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TABLE 13.2.1 

Inorganic Phosphorus X l , Organic Phosphorus X 2 , and Estimated Plant-available 
Phosphorus Y in 18 Iowa Soils at 20° C. (Parts Per Million) 


Soil Sample 


*2 

Y 

t 

y - t 

1 

0.4 

53 

64 

61.6* 

2.4* 

2 

0.4 

23 


59.0 

1 0 

3 

3.1 

19 

71 

63.4 

76 

4 

0.6 

34 

61 

60.3 


5 

4,7 

24 

54 


-12 7 

6 

1.7 

65 

77 

64 9 

12 1 

7 

9.4 

44 

81 

76.9 

4 1 

8 

10.1 

31 

93 



9 

11.6 

29 

93 

79.6 

134 


12.6 

58 

51 

83.8 

-32.8 

11 


37 

76 

79.0 


12 

23.1 

46 

96 


- 5.6 

13 

23.1 

50 

77 

101.9 

-24.9 

14 

21.6 

44 

93 

98.7 

- 5.7 

15 

23.1 

56 

95 


- 7.4 

16 

1.9 

36 

54 

62.8 

- 8.8 

17 

26.8 

58 

168 

109.2 

58.8 

18 

29.9 

51 

99 

114.2 

-15.2 

Sum 

215.0 

758 

1,463 

1,463.0 

0.0 


Mean 11 94 42.11 81.28 


ZAT, 2 = 4,321.02 

ZX t X 2 == 10,139.50 

ZXJ'^ 20,706.20 

C= 2,568.06 

C= 9,053.89 

C = 17,474.72 

S*! 2 = 1,752.96 

Zx i x 2 : = 1,085 61 

Zx t y = 3,231.48 

XX 2 2 = 35,076.00 

ZX 2 Y = 63,825.00 

ZY 1 = 131,299.00 

C = 31,920.22 

C = 61,608.56 

C = 118,909.39 

Zx 2 2 = 3,155.78 

Zx 2 y = 2,216.44 

Zy 1 = 12,389 61 


* The number of significant digits retained m the preceding calculations will affect these 
columns by ±0.1 or ±0 2 

From the fitted regression (equation 13.2.10), the predicted value Y 
can* be estimated for each soil sample in table 13.2.1 For example, for 
soil 1, 


?= 56.26 + 1.7898(0.4) + 0.0866(53) = 61.6 ppm 

L The observed value Y = 64 ppm deviates by 64 - 61.6 = +2.4 ppm from 
I the estimated regression value. The 18 values of f are recorded in table 
f 13.2.1. The deviations Y — t are in the final column; they measure the 
failure of the X's to predict Y. 

The investigator now has the opportunity to examine the deviations 
from regression. In part they might be associated with other variables not 
included in the study. Or some explanation might be found for certain 
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ieviations — especially the larger ones. Such explanation might be a valu- 
able finding of the analysis, providing clues for further experimentation, 
or it might lead to the rejection of one or more observations and to a 
recalculation of the regression. In the present example the results for soil 
17 immediately strike the eye. This soil has much the highest value of Y , 

1 68 . Before the regression was calculated, this value might not seem neces- 
sarily out of line (though it should be verified from the records), because 
soil 17 has the second highest value of both types of soil phosphorus, 
which could account for the high plant phosphorus. But this soil also has 
the highest deviation Y — f = +58.8. A test of this deviation will be 
presented in section 13.5. 

A check on the linearity of the regression is made by plotting two 
scatter diagrams. First, plot the deviations Y — f against X ly then plot 
the same deviations against X 2 . If the regression is markedly non-linear 
in one of the X% a curve instead of a horizontal straight line should be 
detectable in the corresponding graph. For curved multiple regression, 
see example 15.5.1. 

13.3 — The deviations mean square and the F-test In the multiple 
regression model, the deviations of the 7’s from the population regression 
plane have mean 0 and variance a 2 . An unbiased estimate of cr 2 is 
s 2 = E( Y - f) 2 /(n - k\ where n is the size of sample and k is the number 
of parameters that have been estimated in fitting the regression. In the 
example n = 18 with 3 parameters a, /? 2 , giving n — k = 15. 

The deviations sum of squares Z( F — t ) 2 can be computed in two 
ways. If the individual deviations have been tabulated as in the last 
column of table 13.2.1, their sum of squares is run up directly, giving 
E(F — f) 2 = 6,414.5. 

In practice a quicker method, based on an algebraic identity, is used. 
From equation (13.2.4) we had 

t ~ Y + b 1 x 1 + b 2 x 2 

Since the sample meansof x i and x 2 are both zero, the sample mean of 
the fitted values Y is Y. Write $ = f — Y and d = Y — Y , so that d 
represents the observed deviation of Y from the fitted regression at this 
point. It follows that 

y=Y- F= (?~ F) + (F-?) = 9 + d. (13.3.1) 

Two important results, proved later in this section, are, first, 

Ey 2 = 'Ey 2 + Ed 2 (13.3.2) 

This result states that the sum of squares of deviations of the T s from their 
mean splits into two parts ; (i) the sum of squares of deviations of the 
fitted values from their mean, and (ii) the sum of squares of deviations 
from the fitted values. The sum of squares Ep 2 is appropriately called 
“the sum of squares due to regression” In geometrical treatments of 
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multiple regression, the relation (equation 3.3.2) may be shown to be 
an extension of Pythagoras* theorem to more than two dimensions. 

The second result, of more immediate interest, is : 

S.S . due to regression = I$ 2 = b{Lx x y + (13.3.3) 

Hence, the sum of squares of deviations from the regression may be ob- 
tained by subtracting from £j 2 the sum of products of the V s with the 
right sides of the corresponding normal equations. For the example we 
have, 

Ij> 2 - (1 .7898)(3,23 1 .48) + (0.0866)(2,2 16.44) = 5,975.6 

The value of 'Ld 1 is then 

Ed 2 - Z>> 2 - Ij> 2 = 12,389.6 - 5,975.6 = 6,414.0 

Besides being quicker, this method is less subject to rounding errors than 
the direct piethod. Agreement of the two methods is an excellent check 
on the regression computations. 

The mean square of the deviations is 6,414.0/1 5 = 427.6, with 1 5 d.j. 
The corresponding standard error, J 427.6 = 20.7, provides a measure 
of how closely the regression fits the data. If the purpose is to find a more 
accurate method of predicting Y, the size of this standard error is of 
primary importance. For instance, if current methods of predicting some 
critical temperature can do this with a standard error of 3.2 degrees, while 
a multiple regression gives a standard error of 4.7 degrees, it is obvious 
that the regression is no improvement on the current methods, though it 
might, after further study, be useful in conjunction with the current 
methods. 

Sometimes the object of the regression analysis is to understand why 
Y varies, where the X’s measure variables that are thought to influence Y 
through some causal mechanism. For instance, Y might represent the 
yields of a crop grown on the same field for a number of years under uni- 
form husbandry, while the JTs measure aspects of weather or insect in- 
festation that influence crop yields (2), In such cases, it is useful to com- 
pare the Deviations mean square, "LcP/{n — k ), with the original mean 
square of Y, namely 'Ly 2 /(n — 1). In our example the Deviations mean 
square is 427.6, while the original mean square is 12,389.61/17 = 728.8. 

The ratio, 427.6/728.8 = 0.59, estimates the fraction of the variance 
of Y that is not attributable to the multiple regression, while its comple- 
ment, 0.41, estimates the fraction that is “explained” by the A'-variables. 
Even if the regression coefficients are clearly statistically significant, it is 
not uncommon to find that the fraction of the variance of Y attributable 
to the regression is much less than 1/2. This indicates that most of the 
variation in Y must be due to variables not included m the regressiop. 

In some studies the investigator is not at all confident initially that 
any of the Xs are related to Y. In this event an F-test of the null hypothesis 
Px “ fi 2 == 0 is helpful. The <-st is made from the analysis of variapce in 



TABLE 13.3,1 

Analysis of Variance of Phosphorus Data 


Source of Variation 

Degrees of 
Freedom 

Sum of Squares 

Mean Square 

F' 

Regression 

2 

Zf = 5,975.4 

2 987.8 

6.99** 

Deviations 

15 

ld 2 = 6,414.0 

427.6 


Total 

17 

Xy 2 =12,389 6 

728.8 



table 13.3.1. jP is the ratio of the mean square due to regression to the 
Deviations mean square. 

The F-value, 6.99, with 2 and 15 dj I, is significant at the 1% level* 

By an extension of this analysis, tests of significance of the individual 
b' s can be made. We have (from table T 3.2.1) 

ExjV = 3,231.48 : Y. Xi 2 = 1,752.96 : 2x^ = 2,216.44 : 2x 2 2 = 3,155.78 

If we had fitted a regression of Y on X x alone, the regression coefficient 
would be b Y i = 3,231.48/1,752.96 = 1.8434. The reduction in sum of 
squares due to this regression would be (Zx 1 > > ) 2 /Xx 1 2 = (3,231.48) 2 / 
(1,752 96) - 5,957 0 with 1 df. When both X { and X 2 were included in 
the regression me i eduction in sum of squares was 5,975.6, with 2 d.f. 
(table 13 \ i n The difference 5,975 6 - 5,957.0 = 18.6, with 1 </./., mea- 
sures the additional leduction due to the inclusion of X 2 , given that X x 
is already present, or in other words the unique contribution of X 2 to 
the regression. The null hypothesis f} 2 = 0 is tested by computing 
F = 18.6/427.6 = 0.04, with 1 and 15 d.j ., where 427.6 is the deviations 
mean square. The test is shown in table 1 3.3.2 Since Fis small, the null 
hypothesis is not rejected. 

Similarly, the null hypothesis p x = 0 is tested by finding the addi- 
tional reduction in sum of squares due to the inclusion of X x in the regres- 


TABLE 13 3 2 

Test of Each X After the Effect of the Other Has Been Removed 


Degrees of 

Source of Variation Freedom 

Sum of Squares 

Mean 

Square 

F 

X x and X 2 

2 

X? 2 = 5,975.6 



X x alone 

1 

(X^yjVXV = 5,957.0 



X 2 after X x 

1 

18.6 

18.6 

004 

X x and X 2 

2 

Xj> 2 = 5,975.6 



X 2 alone 

1 

( Z X2 y) 2 /Zx 2 2 = 1,556 7 



X x after X z 

1 

4,418.9 

4,418.9 

10 30** 

Deviations 

15 

6,414 0 

427 6 




388 Chapter 13: Multiple Regression 

sion after X 2 has already been included (table 13.3.2). In this case 
F = 10.30 is -significant at the 1% level. 

This method of testing a partial regression coefficient may appear 
strange at first, but is very general. If fi k = 0 when there are k X-variables, 
this means that the true model contains only X x . . . X k „ t . We fit a regres- 
sion on X x . . . X k - U obtaining the reduction in sum of squares, R k - X . 
Then we fit a regression on X x . . . X k , obtaining the reduction R k . If 
= 0, it can be proved that (R k — R k - X ) is simply an estimate of a 2 , so 
that F = ( R k — R k - x )Js 2 should be about 1. If, however, p k is not zero, 
the inclusion of X k improves the fit and {R k — R k „ x ) tends to become 
large, so that F tends to become large. Later, we shall see that the same 
test can be made as a f-test of b k . 

Incidentally, it is worth comparing b YX = 1.8434 with the value 
b\ = b YX . 2 ~ 1.7898 obtained when X 2 is included in the regression. Two 
points are important. The value of the regression coefficient has changed. 
In multiple regression, the value of any regression coefficient depends on' 
the other variables included in the regression. Statements made about 
the size of a regression coefficient are' not unique, being conditional on 
these other variables. Secondly, in this case the change is small — this 
gives some assurance that this regression coefficient is stable. With X 2 , 
we have b Y2 = 2,216.44/3,155.78 = 0.7023, much larger than b 2 ~'b Y2 . x 
= 0.0866. 

The remainder of {his section is devoted to proofs of the basic results 
(13.3,2) and (13.3.3). Recall that 

y ~ t - Y = b 1 x l 4- b 2 x 2 : y — j> 4- d 
d = y — b l x 1 — b 2 x 2 

Start with the normal equations : 

b x Yx x 2 4- b 2 Hx x x 2 = Ex x y 
b x Yx x x 2 4- b 2 Yx 2 2 = Ex 2 y 
These may be rewritten in the form 

Ex^y — b l x 1 - b 2 x 2 ) = Yx t d = 0 (13.3.4) 

Ex 2 (y — b l x l - b 2 x 2 ) = Yx 2 d = 0 (13.3.5) 

These results show that the deviations d have zero sample correla- 
tions with any 2f-variable. This is not surprising, since d represents the 
part of Y that is not linearly related either to X x or to X 2 . 

Multiply (13.314) by b x and (13.3.5) by b 2 and add. Then 

Y(b x x x 4- b 2 x 2 )d = Jlyd = 0 (13.3.6) 

Now 


ly 2 = E(y 4- d) 2 = Ey 2 + 211yd + E d 1 
= Ey 2 4- Yd 2 
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using (13.3.6). This proves the first result (13.3.2). To obtain the second 
result, we have 

2 f = + b 2 x 2 ) 2 

= b l 2 'Lx x 1 ' -f 2b l b 2 Zx l x 2 4- b 2 2 l,x 2 2 

Reverting to the normal equations, multiply the first one by b u the second 
by b 2 and add. This gives 

+ 2b l b 2 'Lx l x 2 + b 2 Ix 2 2 = b 1 Ixjy + h 2 l*x 2 y 

This establishes (13.3.3), the shortcut method of computing the reduction 
Xj> 2 in S.S. due to regression. 

EXAMPLE 13.3.1— Here is a set of ten triplets for easy computation. 


*1 

*2 

Y 

*i 

*2 

Y 

29 

2 

22 

16 

1 

12 

1 

4 

26 

26 

1 

13 

5 

3 

23 

15 

4 

30 

27 

1 

11 

6 

2 

12 

25 

3 

25 

10 

3 

26 


Sums 160 

24 

200 


(i) Calculate the regression, ? = 0.241^ + 6.829 X 2 - 0.239 

(li) Predict the value of Y for the fourth member of the sample, (X t * 27, X 2 *= 1). 
Ans. 13.07. 

EXAMPLE 13.3.2 — In the preceding example, compute the total S.S . of Y and the 
S.S due to regression. Hence, find the sum of squares of deviations. Ans. 35.0. 

EXAMPLE 13.3.3 — Show that after allowing for the effects of the other variable, both 
X l and X 2 have a significant relation with Y. 

EXAMPLE 13.3.4 — Note that when X t is fitted alone, the regression coefficient is 
negative; i.e., Y tends to decrease as X x increases. When X 2 is included, the coefficient b x be- 
comes significantly positive. From the normal equations the following relation may be 
proved 4 

hyi-2 — b Yl — b Y2 . i b 2 \ 

where b 2 1 = Yx l x 2 /'£x l 2 is the regression of X 2 on X x . If b Y2 . x is positive and b 2i is nega- 
tive, as m this example, the term -b Y1 - 1^21 1S positive. If this term is large.enough it can 
change a negative b Yl into a positive b YX . 2 . 

13.4— Alternative method of calculation. The inverse matrix. For 
many purposes, including the construction of confidence intervals for the 
/T s and the making of comparisons among the b\ some additional quan- 
tities must be computed. If it is known that these will be needed, the 
calculations given in preceding sections are usually altered slightly, as will 
be described. 

On the left side of the normal equations, the quantities 'Lx 1 x 2 , 
and Zx 2 2 appear. The array 
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/E*! 2 Lx 1 % 2 ) 

\1,X 1l X2 Zx 2 2 ) 

is called a matrix with 2 rows and 2 columns — the matrix of sums of 
squares and products. Mathematicians have defined the inverse of this 
matrix, this being an extension to two dimensions of the concept of the 
reciprocal of a number. The inverse is also a 2 x 2 matrix : 

('" c‘) 

\ c 21 c 22 / 

The elements c ip called also the Gauss multipliers, are found by solving 
two sets of equations 

First Set Second Set 

c n Txj 2 + = 1 c 21 S'xi 2 + c 22 Zx x x 2 = 0 

CuSxjXj + c 12 T,x 2 2 = 0 c 21 'Lx 1 x 2 + c 22 ?.x 2 2 = 1 

The left side of each set is the same as that of the normal equations. The 
right sides have 1, 0, or 0, 1, respectively. The first set give c u and c 12 , 
the second set c 21 and c 22 . It is easy to show that c 12 = c 21 . 

In the 2 x 2 case the solutions are : 

Cji — Sx 2 2 /D : c 12 = c 22 — — Txj x 2 /D : c 22 = Txj 2 /D, 
where, as before, 

D = (Ex^XXxj 2 ) - (ExjX 2 ) 2 

Note that the numerator of 1 is Ix 2 \ not Ix x 2 . Note also the negative 
sign in c l2 . 

In the example, the matrix of sums of squares and products was 

/l, 752.96 1,085.61\ 

\1, 085.61 3,155.78/ 

with D — 4,353,400. This gives 

c u = 3,155.78/4,353,400 = 0.0007249 

c l2 = -1,085.61/4,353,400 = -0.0002494 
c 22 = 1,752.96/4,353,400 = 0.0004027 

From the c’s, the b’s are obtained as the sums of products of the c’s 
with the night sides of the normal equations, as follows : 

b r = c 11 Sx 1 y + c 12 Ix 2 y 

= (0.0007249X3,231.48) + (-0.0002494)(2,2 16.44) = 1.7897 (13.4.1) 
b 2 = c 21 Zx!y + c 22 Ix 2 y 

= ( — 0.0002494) (3,231 .48) + (0.0004027)(2,2 16.44) = 0.0866 (13.4.2) 

The main reason for finding the c’s is that they provide the variances 
and the covariance of the fa’s. The formulas arc : 
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Vibj) = a 2 c u : V{b 2 ) = <r 2 c 22 : C<w(6,6 2 ) = <r 2 c 12 

where a 2 is the variance of the residuals of Y from the regression plane. 

To summarize, if the c’s are wanted, they are computed first from the 
normal equations; then the b ' s are computed from the c’s as above. The 
deviations sum of squares and the analysis of variance follow as in sec- 
tion 13.3. Some uses of the c’s are presented in the next section. 

EXAMPLE 13.4.1 — To prove the relations b x » c tl l.x l y + ; b 2 — c 2i Zxxy 

+ c 2 ^Zx 2 y, use these relations to substitute for b x and b 2 in terms of the c’s in the left side 
of the first normal equation. Then show, by the first equation satisfied by the c’s in each set, 
that this left side equals Zx t y. Similarly, you can show that the left side of the second 
normal equation equals 'Lx 2 y . This proves that the b' s computed as above are solutions of 
the normal equations. 

EXAMPLE 13.4.2 — Show (i) that b x and b 2 have zero correlation only if Zx l x 2 ** 0: 
(ii) that, in this event, the regression coefficient of Y on X x is the same whether X 2 is included 
in the regression or not. Thi&is the condition that holds for the main effects of each factor 
in a factorial experiment. 

13.5 — Standard errors of estimates in multifile regression. In section 
13.3 we found that the deviations mean square s 2 was 427.6 with 15 d.f., 
giving s = 20.7. The standard errors of b x and b 2 are therefore 

s bl = Sv/c u = 20.7^0.0007249 = (20.7) (0.0269) = 0.557 (13.5.1) 

# 

s b2 = sj~c 22 = 20.770.0004027 = (20.7)(0.0201) = 0.416 (13.5.2) 

It can be proved that the quantity (b x - fi x )/s bl is distributed as t with 
(ft — /c) or 1 5 d.f \ The null hypothesis fix =0 can be tested as usual: 

t x = bjs bl = 1.7898/0.557 = 3.21** 
t 2 = b 2 /s b2 = 0.0866/0.416 « 0.21 

These /-tests are identical to the F-tests of the same hypotheses made in 
table 13.3.2. Note that (3.21) 2 = 10.30 and (0.21) 2 = 0.04, these being 
the two values of F found in table 13.3.2. 

Evidently in the population of soils that were sampled the fraction 
of inorganic phosphorus is the better predictor of the plant-available 
phosphorus. The experiment indicates “that soil organic phosphorus per 
se is not available to plants. Presumably, the organic phosphorus is of 
appreciable availability to plants only upon mineralization, and in the 
experiments the rate of mineralization at 20°C. was too low to be of mea- 
surable importance.” 

Confidence limits for any fi t are found as usual. For fi x , 95% limits are 
bx ± t 0 ' 0S s bl = 1.790 ± (2.131)10.557) = 0.60 and 2.98 

Sometimes, comparisons among the b { are of interest. The standard 
error of any comparison is 

s7(SL, 2 c„ + 2£L I -L,-c, v ) 

For example, the standard error of {b 1 — b 2 ) is 


(13.5.3) 
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sjfcr+~c 22 ~- 2c 12 ) = (20.7)V0.0007249 + 0.0004027 - 2(~ 0.0002494) 
« (20.7) N /(0.001 6264) = 0.835 

When the regression is constructed for purposes of prediction, we 
wish to know how accurately ? predicts the population mean of 7 for 
specified values of Xj and X 2 . Call this mean /x. For instance, we might 
predict the average weight of 1 1 -year-old boys of specified height X t and 
chest girth X 2 . The formula for the estimated standard error of t = ft is 

s A = Syfii/n 4- c ll x 1 2 W c 22 x 2 2 -f 2c 12 x 1 x 2 ) (13.5.4) 

Example : For the value of 7 at the point ^ = 4.7, X 2 = 24 (soil sample 
5 in table 13.2.1): x x = 4.7 — 11.9 = - 7.2, * 2 = 24 - 42.1 = - 18.1; so 
the standard error of the estimate t is 

(20.7)V[1/18 + (0.0007249) ( — 7.2 J 2 + (0.0004027)(-18.1) 2 
+ 2( — 0.0002494)( — 7.2)( — 18.1) ] - ±8.25 ppm 

Alternatively, t may be used to predict the value of 7 for an indi- 
vidual new member Y' of the population (that is, one not included in the 
regression calculations.) In this case, 


9f = s Jl ± ^ 4* Cn x i 2 + c 22 x 2 2 4- 2c 12 x 1 x 2 (13.5.5) 

This result is subject to the assumption that the new member comes 
from the same population as the original data. Unless the predictions 
•satisfy this condition, the standard error should be regarded as tentative. 
It will be too low if the passage of time or changes in the environment 
have changed the values of the /Ts. If numerous predictions are being 
made, a direct check on their accuracy should be made whenever possible. 

Finally, the standard error of (Y t - ?), where Y t is one of the ob- 
servations from which the regression was computed, is cr^/g, where 

g ~ 1 - ~ - c ll x 1 2 - c 22 x 2 2 - lc l2 x l x 2 (13.5.6) 

However, if the deviation (Y t - f) has aroused attention because it 
looks suspiciously large, we cannot apply a t-test of the form 
t = ( Y t - Y)/Syjg, for two reasons. The quantities (Y l — f) and & are not 
independent, since (Y t - f) 2 is a part of the deviations S.S. Secondly, we 
must allow for the fact that (Y t — f) was picked out because it looks large. 

A test can be made, as follows. The quantity 

s' 2 « [1(7 - 7) 2 - (15 - Y) 2 /g]/(n - fc - 1) 

can be shown to be the mean square of the deviations obtained if the 
suspect Y t is omitted when fitting the regression. If Y t were a randomly 
chosen observation, the quantity f' = (7, — Y)/s’*Jg would follow the 
/-distribution with (n — k — 1) d.f. To make approximate allowance for 



the fact that we selected the largest absolute deviation, we regard the 
deviation as significant at the 5 % level if t' is significant at the level 0.05 jn 
(This may require reference to detailed tables (3) of t.) 

To illustrate, it was noted (section 13.2) that the deviation +58.8 
for soil 17 is outstanding. The value of g is found to be 0.80047, while 
S(T— ?) 2 is 6,414 (section 13.3) with 15 d.f. Hence, 

6,414 ~ wm ] = (M14 - 4 > 319 )/ 14 = 150 

t' = (58.8)/V(150)(0.80047) = 5.36 (14 d.j.) 

Since 0.05/1 8 = 0.0028, the question is whether a value of 5. 36 exceeds 
the 0.0028 level of t with 14 d.f. Appendix table A 4 shows that the 0.00 i 
level of t is 4.140. The deviation is clearly significant after allowance for 
the fact that it is th$ largest. If the regression is recomputed with soil 17 
excluded, the main conclusion is not altered. The value of b x drops to 
1.290 but remains significant, while b 2 becomes -0.1 11 (non-significant). 

EXAMPLE 13.5.1 — In the phosphorus data, set 95% confidence limits for fi 2 . Ans 
— 0.79 to 0.97 ppm. 

EXAMPLE 13.5.2 — For a new soil having X x — 14.6, X 2 » 51, predict the value of Y' 
and give the standard error of your prediction. Ans. ? « 61.86, s.e. - ±21 5 ppm, using 
formula 1 3 5.5. 



EXAMPLE 13.5.3 — If Y t is one of the observations from which the regression was com- 
puted, the variance of Y t - f is (formula 13.5.6), 


1 - c 22 x 2 2 - 2 c l2 x x x 2 

n 


If this expression is added over all the n sample values, we get 


<r 2 [n — 1 — Cj i£.Vj 2 — ~~ 2 cj 2 ^^i^ 2 l 


From the equations for the c’s, show that the above equals a 2 (n — 3). This is one wa> of 
seeing that £( Y — t) 2 has (n — 3) d.f. 

EXAMPLE 13.5.4— With soil 17 omitted, we have 

I>X x = 1 88.2 LX 2 = 700 1 Y = 1 ,295 

Y,x x 2 = 1,519.30; ^x x x 2 = 835.69, Xx 2 2 = 2,888.47; 

l,x x y = 1,867.39, Xx 2 y = 757.47; Yy 2 = 4,426.48 

Solve the normal equations and venfy that b x = 1.290, b 2 — -0.111, deviations S S 


13.6 — The interpretation of regression coefficients. In the many areas 
of research in which controlled experiments are not practicable, multiple 
regression analyses are extensively used in attempts to disentangle and 
measure the effects of different ^-variables on some response Y. There 
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are, however, important limitations on what can be learned from this 
technique in observational studies. While the discussion wil 1 be given 
for a regression on two X-variables, the conclusions apply also when there 
are more than two. The multiple linear regression model on which the 
analysis is based is 

Y = a + 0 2 X 2 + e (13.6.1) 

where the residuals e are assumed to be distributed, independently of the 
JTs, with zero mean and variance a 2 . (The assumption of normality of 
the s ' s is required for tests of significance, but not for the other standard’ 
properties of regression estimates.) We assume that the X's remain fixed 
in repeated sampling. 

In an observational study the investigator looks for some suitable 
source in which he can measure or record a sample of the triplets 
(X u X 2 , Y). He may try to select the pairs (X u X 2 ) according to some 
plan, for instance so as to ensure that both JIT’s vary over a substantial 
range and that their correlation is not too high, though he is limited in this 
respect by what the available source can provide. 

Difficulty arises because he can never be sure that there are not other 
JIT-variables related to Y in the population sampled. These may be vari- 
ables that he thinks are unimportant, variables that are not feasible to 
measure or record, or variables unknown to him. Consequently, instead 
of (13.6.1) the correct regression model is likely to be of the form 

F = a -f 0 x X x + $ 2 X 2 + 0 3 X 3 + ... + p k x k + s' 

where X 3 ... X k represent these additional variables, and k may be fairly 
large. To keep the algebra simple we replace the additional terms in the 
model, p 3 X 3 -b . . . 4* fi k X k , by a single term /? 0 JT 0 , which stands for the 
joint effect of all the terms omitted from the two-variable model. Thus 
the correct model is 

r * a + Mi + P 2 X 2 + &JC 0 + s' (13.6.2) 

where s' represents that part of Y that is distributed independently of 
X l9 X 2 , and X 0 . 

The investigator computes the sample regression of Y on X 1 and X 2 
as m preceding sections, obtaining the regression coefficients b x and b 2 . 
Under the correct model (13.6.2), it will be proved later that b 1 is an un- 
biased estimate, not of /? x , but of 

Pi +J8A1.2 (13.6.3) 

where b ol . 2 is the sample regression coefficient of X 0 on X u after allowing 
for the effects of X 2 . Clearly, b x may be either an overestimate or an 
underestimate of f} t . Since the bias in b 1 depends on variables that have 
not been measured, it is hard to form a judgment about the amount of bias. 

For example, an investigator might try to estimate the effects of 
nitrogen and phosphorus fertilizers on the yield of a common farm 



395 


crop by taking a sample of farms. On each field he records the crop 
yield Y at the most recent harvest and the amounts X x , X 2 of N and P per 
acre applied in that field. If, however, substantial amounts of fertilizer 
are used mainly by the more competent farmers, the fields on which X x 
and X 2 have high values will, in general, have better soil, more potash 
fertilizer, superior drainage and tillage, more protection against insect 
and crop damage, and so on. If /J„* 0 denotes the combined effect of these 
variables on yield, X 0 will be positively correlated with X x and X 2 , so that 
b oX . 2 will be positive. Further, fi 0 will be positive if these practices in- 
crease yields. Thus the regression coefficients b x and b 2 will overestimate 
the increase in yield caused by additional amounts of N and P. This type 
of overestimation is likely to occur whenever the beneficial effects of an 
innovation in some process are being estimated by regression analysis, if 
the more capable operators are the ones who try out the innovation. 

When the purpose is to find a regression formula that predicts Y 
accurately rather than to interpret individual regression coefficients, the 
bias in b x may actually be advantageous. Insofar as the unknown vari- 
ables in X 0 are good predictors of Y and are stably related to X x , the regres- 
sion value of b x is in effect trying to improve the prediction by capitalizing 
on these relationships. This can be seen from an artificial example (in 
which X 2 is omitted for simplicity). Suppose that the correct model is 
Y = 1 + 3X 0 . This implies that in the correct model (i) X x is useless as a 
predictor, sinu, p x = 0, (ii) if X 0 could be measured, it would give perfect 
predictions, since the model has no residual term s'. In the data (table 
13.6.1), we have constructed an X x that is highly correlated with X a . 
Y ou may check that the prediction equation based on the regression of 
.YonX u 


?! = 2.5 + 3.5*! 


gives good, although not perfect, predictions. Since = 3, b oX = 7/6, 
b x = 7/2, the relation b x = p 0 b oX is’also verified. 


TABLE 13.6 1 

Artificial Example to Illustrate Prediction From an Incomplete Regression Model 


Observation 

*0 

Y - 1 +3X 0 



Y - ?, 

1 

1 

4 

0 

2.5 

+ 15 

2 

2 

7 

2 

9.5 

-2.5 

3 

4 

13 

3 

13.0 

0.0 

4 

6 

19 

5 

20.0 

-1.0 

5 

7 

22 

5 

20.0 

+ 20 

Sum 

20 

65 

15 

65.0 

0,0 

Mean 

4 

13 

3 

13.0 

0 0 


63 _ , 21 7 

£x x 2 = 18, Ix t y = 63, Xxo*! =21, — -jg = 3 5, h 0l = ~ - - 
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To return to studies in which the sizes of the regression coefficients 
are of primary interest, a useful precaution is to include in the regression 
any Z-variable that seems likely to have a material effect on 7, even 
though this variable is not of direct interest. Note from formula 13.6.3 
that no contribution to the bias in b l comes from jS 2 , since Z 2 was in- 
cluded in the regression. Another strategy is to find, if possible, a source 
population in which Z-variables not of direct interest have only narrow 
ranges of variation. The effect is to decrease b ol . 2 (see example 13.6.1) 
and hence lessen the bias in b x . It also helps if the study is repeated in 
diverse populations that are subject to different X Q variables. The finding 
of stable values for b x and b 2 gives reassurance that the biases are not 
major. 

In many problems the variables X i and X 2 are thought to have causal 
effects on 7. We would like to learn by how much 7 will be increased 
(if beneficial) or decreased (if harmful) by a given change AZ* in X u 
The estimate of this amount suggested by the multiple regression equation 
is b x AX x . As we have seen, this quantity is actually an estimate of 
(fi x + f}J> oX . 2 )AX x . Further, while we may be able to impose a change of 
amount AX x in X x we may be unable to control other consequences of 
this change. These consequences may include changes AX^ in X 2 and 
AX 0 in X Q . Thus the real effect of a change AZ t may be, from model 1 3.6.2, 

PiAX x + J3 2 AZ 2 Hh P 0 AX„ (13.6.4) 

whereas our estimate of this amount, which assumes that AX x can be 
changed without producing a change in X 2 and ignores the unknown vari- 
ables, approximates Q 3 X + ji 0 b oX . 2 )AX x . If enough is known about the 
situation, a more realistic mathematical model can be constructed, per- 
haps involving a system of equations or path analysis (26, 27). In this 
way a better estimate of 13.6.4 might be made, but estimates of this type 
are always subject to hazard. As Box (4) has remarked, in an excellent 
discussion of this problem in industrial work, “To find out what happens 
to a system when you interfere with it you have to interfere with it (not 
just passively observe it).” 

To sum up, when it is important to find some way of increasing or de- 
creasing 7, multiple regression analyses provide indications as to which 
Z-variables might be changed to accomplish this end. Our advance esti- 
mates of the effects of such changes on 7, however, may be wrong by 
substantial amounts. If these changes are to be imposed, we should plan, 
whenever feasible, a direct study of the effects of the changes on 7 so 
that false starts can be corrected quickly. 

In controlled experiments these difficulties can be largely overcome. 
The investigator is able to impose the changes (treatments) whose effects 
he wishes to measure and to obtain direct measurements of their effects. 
The extraneous and unknown variables represented by X 0 are present just 
as m observational studies. But the device of randomization (5. 6) makes 
K m effect independent of X x and X 2 in the probability sense. Thus X 0 
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acts like the residual term s in the standard regression n\odel and the 
assumptions of this model are more nearly satisfied. If the effects of X 0 
are large, the Deviations mean square, which is used as the estimate of 
error, will be large, and the experiment may be too imprecise to be useful. 
A large error variance should lead the investigator to study the uncon- 
trolled sources of variation in order to find a way of doing more accurate 
experimentation. 

We conclude this section with a proof of the result (equation 1 3.6.3) ; 
namely, that if a regression of Y on X x and X 2 is computed under the model 

Y'= a + p t X t 4- p 2 X 2 +-ji 0 X 0 + s' E(s') = 0, 

then 


#i) = /*i + /*Ai.2 03.63) 

The result is important in showing that a regression coefficient is free from 
any bias due to other X’s like X 2 that are included in the fitted regression, 
but is subject to bias from X's that were omitted. Since it is convenient 
to work with deviations from the sample means, note that from the model, 
we have 


Y- F«/J l x 1 + j8 a x 2 + to. + «'-*' (13.6.5) 

Now, 

b t c l {Lx i y + c X iLx 2 y 
Substitute for y from 13.6.5, 

b x = c n 'Lx l (fi i x l 4 § 2*2 + Po X e 4- s' - s') 

4 -c l2 'Lx 2 (p 1 x l 4- $ 2 x i + PcK + «' - X) 

When we average over repeated samples, all terms in s', like CiiZx t c\ 
vanish because s' has mean zero independently of x t * x 2 , and x 0 . Collect 
terms in /? l5 jS 2 , and /?„. 

E(b t ) = fi 1 (c n 'Lx l 2 4- Ci 2 'Lx 1 x 2 ) 4- fi 2 (c l {Lx x x 2 4 c l2 Zx 2 2 ) 

4- p 0 (c lx Y.x x x 0 4- c l2 Ex 2 x 0 ) 

From the first set of equations satisfied by c xl and c l2 (section 13.4), the 
coefficient of fi x is 1 and that of /J 2 is 0. 

What about the coefficient of /?*? Notice that it* resembles 

j_ Z-Xi j 4- c x2 Ex 2 y = b Xi 

except that x 0 has replaced y. Hence, the coefficient of /?„ is the regression 
coefficient b ot . 2 of X 0 on X x that would be obtained By computing the 
sample regression of X 0 on X x and X 2 . This completes the proof. 

EXAMPLE 13 6.1 — ' This illustrates the result that when there are omitted variables 
denoted by X 0 , the bias that they create in />, depends both on the size ft e of their effect on 
Y and on the extent to which X 0 varies. Let Y — X x ¥ X 0 , so that ft, « ft,, ~ L In sample 1 , 
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X x and X 0 have the same distribution. Verify that b t = 2. In sample 2, X x and X 0 still have 
a perfect correlation but the variance of X D is greatly reduced. Verify that b t is now 1.33, 
giving a much smaller bias. Of course, steps that reduce the correlation between X x and X Q 
are also helpful. 


Sample 1 



Sample 2 



x. 

Y 


Xi 

x. 

Y 

—6 

—6 

— 12 


-6 

-2 

-8 

— 3 

-3 

- 6 


-3 

-1 

-4 

0 

0 

0 


0 

0 

0 

0 

0 

0 


0 

0 

0 

9 

9 

18 


9 

3 

12 

Sum 0 

0 

0 

Sum 

0 

0 

0 

Zx 1 2 ^ 126 ) Tx t y 

-252 



- 126, Xx x y 

- 168 



13.7 — Relative importance of different X-variables. In a multiple- 
regression analysis the question may be asked: Which X variables are 
most important in determining Y1 Usually, no unique or fully satisfac- 
tory answer can be given, but several approaches have been tried. Con- 
sider first the situation in which the objective is to predict Y or to “explain” 
the variation in Y. The problem would be fairly straightforward if the 
^-variables were indepeiujent. From the model 

Y = a 4- fiiX x 4- P 2 X 2 4- ... 4- fii<Xk 4 8 
we have, in the population, 

0y 2 = 4- fi 2 2 02 2 4 ... 4- Pk 2( *k 2 4 O' 2 

where a 2 denotes the variance of X v The quantity P 2 o' 2 /g y 2 measures 
the fraction of the variance of F.attributable to its linear regression on X { . 
This* fraction can be reasonably regarded as a measure of the relative 
importance of X t . With a random sample from this population, the 
quantities b i 2 Yx 2 ['Ly 2 are sample estimates of these fractions. # (In small 
samples a correction for bias might be advisable since b 2 'Lx 2 !'Ly 2 is not 
an unbiased estimate of fi 2 (r 2 /c Y 2 .) 

The square roots of these quantities, b^iJLx^/Yy 2 ), called the stan- 
dard partial regression coefficients , have sometimes been used as measures 
of relative importance, the JTs being ranked in order of the sizes of these 
coefficients (ignoring sign). The quantity y/(Lx? fLy 2 ) is regarded as a 
correction for scale. The coefficient estimates the change in 7, 

as a fraction of <jy, produced by one S.D. change in X r 

In practice, correlations between the JTs make the answer more 
difficult. In many applications, X x and X 2 are positively correlated with 
each other and with Y. For instance, X t and X 2 may be examination 
scores that predict a student’s ability to do well in a course, and This final 
score m that course. To illustrate this case, table 13.7.1 shows the normal 
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equations, the b ' s and the analysis of variance. As the example is con- 
structed, X x is a slightly better predictor than X 2 , the two together ac- 
counting for about 70% of the variation in y (reduction due to regression 
26.53 out of a total S.S. of 38.00). 

As is typical in such applications, each variable’s contribution to 
Ey 2 is much greater when the variable is used alone than when it follows 
the other variable. For X x the two sums of squares are 22.50 and 9.63, 
respectively, while for X 2 they are 16.90 and 4.03. If the sums of squares 
when X x and X 2 appear alone are taken to measure the contributions of 
X x and X 2 to the variation in Y, the two contributions add to 39.40, 
which is more than Ey 2 (38.00). On the other hand the sums of squares 
9.63 and 4.03 greatly underestimate the joint contribution of X x and X 2 . 
Neither method of measuring the relative contribution is satisfactory. 

TABLE 13.7.1 

A Common Situation in Two-Variable Regression. Artificial Data 


Normal equations: 


lOiq + 5b 2 = 15 
5 b x + 106 2 = 13 

c n = c 22 = 2/15 : c 12 = — 1/15 : ^ = 17/15 : b 2 = 11/15 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Total 

52 

38.00 

Regression on X 1 alone 
Regression on X 2 after X 1 

{i 

(Xx^/Sx, 2 = 15 2 /I0 = 22.50 
i 2 2 /c 22 = 1 1 2 /30 = 4.03 

Regression on X 2 alone 
Regression on X t after X 2 

{i 

(Xxjj) 2 /Xx 2 2 = 13 2 /10 = 16.90 
b x 2 /c n = 17 2 /30 = 9.63 

Deviation 

50 

11.47 


Sometimes the investigator’s question is : Is X x when used alone a 
better predictor of Y than X 2 when used alone? In this case, comparison 
of the numbers 22.50 and 1 6 .90 is appropriate. An answer to the question 
has been given by Hotelling (7) for two X-variables and extended by Wil- 
liams (8) to more than two. 

In other applications there may be a rational way of deciding the 
order in which the X’s should be brought into the regression, so that their 
contributions to 'Ey 2 add up to the correct combined contribution. 
In his studies of the variation in the yields of wheat grown continuously 
on the same plots for many years at Rothamsted, Fisher (2) postulated 
the sources of variation in the following order: (1) A steady increase or 
decrease in level of yield, measured by a linear regression on time; (2) 
other slow changes in yields through time, represented by a polynomial in 
time with terms in T 2 , T 3 , T*, T s : (3) the effect of total annual rainfall on 
the deviations of yields from the temporal trend ; (4) the effect of the dis- 




400 Chapter 13: Multiple Regression 

tribution of rainfall throughout the growing season on the deviations 
from the preceding regression. 

Finally, if the purpose is to learn how to change Y in some population 
by changing some ^-variable, the investigator might estimate the sizes 1 
AX l9 AX 2 , etc., of th§ changes that he can impose on X x and X 2 in this 
population by a given expenditure of resources. He might then rate the 
variables in the order of the sizes of b t AX b in absolute terms, these being 
the estimated amounts of change that will be produced in Y. As we have 
seen in the preceding section, this approach has numerous pitfalls. 

13.8 — Partial and multiple correlation. In a sample of 18-year-old 
college freshmen, the variables measured might be height, weight, blood 
pressure, basal metabolism, economic status, aptitude, etc. One purpose 
might be to examine whether aptitude (Y) was linearly related to the 
physiological measurements, If so, the regression methods of the preced- 
ing sections would apply. But the objective might be to study the correla- 
tions among such variables as height, weight, blood pressure, basal 
metabolism, etc., among which no variables can be specified as inde- 
pendent or dependent. In that case, partial correlation methods are ap- 
propriate. 

You may recall that the ordinary correlation coefficient was closely 
related to the bivariate normal distribution. With more than two vari- 
ables, an extension of this distribution called the multivariate normal 
distribution (9) forms the basic model in correlation studies. A property 
of the multivariate normal model is that any variable has a linear regres- 
sion on the other variables (or on any subset of the other variables), with 
deviations that are normally distributed. Thus, the assumptions made 
in multivariate regression studies hold for a multivariate normal popula- 
• tion. 

If there are three variables, there are three simple correlations among 
them, p 12 , Pi 3 , p 2 3 - The partial correlation coefficient , p 12 . 3 , is the cor- 
relation between variables 1 and 2 in a cross section of individuals all 
having the same value of variable 3 ; the third variable is held constant so 
that only 1 and 2 are involved in the correlation. In the multivariate 
normal model, p 12 . 3 is the same for every value of variable 3. 

A sample estimate r 12 . 3 of p 12 . 3 can be obtained by calculating the 
deviations d l3 of variable 1 from its sample regression on variable 3. 
Similarly, find rf 23 . Then r l2 . 3 is the simple correlation coefficient between 
d 13 and d 2 3 . The idea is to measure that part of the correlation between 
variables 1 and 2 that is not simply a reflection of their relations with 
variable 3. It may be shown that r 12 . 3 satisfies the following formula: 

r = r 12 - ^13^23 

12-3 Vd - r i3 2 )(l - r 23 2 ) 

Table A 1 1 is used to test the significance of r 12 . 3 . Enter it with (n - 3) 
degrees of freedom, instead of (n — 2) as for a simple correlation co- 
efficient 
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In Iowa and Nebraska, a random sample of 142 older women was 
drawn for a study of nutritional status (12). Three of the variables were 
Age, Blood pressure, and the Cholesterol concentration m the blood. The 
three simple correlations were 

^ab = 0.3332, r AC = 0.5029, r BC - 0.2495 

Since high blood pressure might be associated with above-average 
amounts of cholesterol in the walls of blood vessels, it is interesting to 
examine r BC . But it is evident that both B and C increase with age. Are 
they correlated merely because of their common association with age or 
is there a real relation at every age? The effect of age is eliminated by 
calculating 

_ 0.2495 — (0.3332)(0.5029) 

Tbc a V(1 - 0.3332 2 )(1 - Q.5029 2 ) U ' 

With / = 142 — 3 = 139, this correlation is not significant. It may be that 
within the several age groups blood pressure and blood cholesterol are 
uncorreiated. At least, the sample is not large enough to detect the cor- 
relation if it is present. 

* As another illustration, consider the consumption of protein and fat 
among the 54 older women who came from Iowa. The simple correlations 
were 


r AP = —0.4865, r AF = —0.5296, r PF — 0.5784 

The third correlation shows that protein and fat occur together in all diets 
while the first two correlations indicate the decreasing quantities of both 
as age advances ; both P and F depend on A. How closely do they depend 
on each other at any one age? 

_ 0.5784 - ( — 0.4865) ( — 0.5296) 
rpF ' A V( 1 - 0.4865 2 )(1 - 0.5296 2 ) 

Part of the relationship depends on age but part of it is inherent in the 
ordinary composition of foods eaten. 

To get a clearer notion of the way in which r PF . A is independent of 
age, consider the six women near 70 years of age. Their protein and fat 
intakes were 

P: 56, 47, 33, 39, 42, 38 

F: 56, 83, 49, 52, 65, 52 = 0.4194 

The correlation is close to the average, r PF . A — 0.4328 Similar correla- 
tions would be found at other ages 

With four variables the partial correlation coefficient between vari- 
ables 1 and 2 can be computed after eliminating the effects of the other 
variables, 3 and 4. The formula is 
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— r 12-4 r 13«4 r 23-4 

12 ‘ 34 ” V(1 - r 13 . 4 2 )(l - r 23 . 4 2 ) 

or, alternatively, 


r 12*34 


r 12*3 ~~ r 14-3 r 24-3 ^ 

\/(l — r 14 . 3 2 )(l — r 24 . 3 2 ) 


the two formulas being identical. 

T q test this quantity in table All, use (« — 4) degrees of freedom. 
As we have stated, partial correlation does not involve the notion of 
independent and dependent variables : it is a measure of interdependence. 
On the other hand, the multiple correlation coefficient applies to the situa- 
tion in which one variable, say 7, has been singled out to examine its joint 
relation with the other variables. Inthe population, the multiple correla- 
tion coefficient between Y and X u X 2 , . . . , X k is defined as the simple 
correlation coefficient between Y and its linear regression, p x X^ + . . . 
+ fi k X k , on X x . . . X k , Since it is hard to attach a useful meaning to the 
sign of this correlation, most applications deal with its square. The sample 
estimate R of a multiple correlation coefficient is, as would be expected, 
the simple correlation between y and P s =Ai* 1 +...+ bjpc k . This gives 

R 2 = (Eyj)) 2 /(Iy 2 )(Zj) 2 ) 

In formula 13.3.6 (p. 388) it was shown that Ydy = 0, where d = y — j>. 
It follows that Yy$ = I/p 2 . Hence, 

Ri^Zf/Yy 2 : 1 -R 2 ^ld 2 /Xy 2 

Thus, in the analysis of variance of a multiple regression, R 2 is the frac- 
tion of the sum of squares of deviations of Y from its mean that is at- 
tributable to the regression, while (1 - R 2 ) is the fraction not associated 
with the regression. This result is a natural extension of the correspond- 
ing result (section 7.3) for a simple correlation coefficient. The test of the 
null hypothesis that the multiple correlation in the population is zero is 
identical to the F-test of the null hypothesis that = p 2 ==...== p k = 0. 
The relation is 

F= (n - k — l)R 2 /k(l — F 2 ), with k and (n — k — 1) df 

EXAMPLE 13.8.1 — Brunson and Willier (13) examined the correlations among ear 
circumference E , cob circumference C, and number of rows of kernels K calculated from 
measurements of 900 ears of com: 

r Ec = 0.799, r EK = 0.570, r CJC - 0.507 

Among the ears having the same kernel number, what is the correlation between E and C? 
Ans v = 0 720. 

EXAMPLE 13 8.2 — Among ears of corn having the same circumference, is there any 
correlation between C and K*> Ans. r CK . E = 0.105. 

EXAMPLE 13.8.3— In a random sample of 54 Iowa women (12), the intake of two 
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nutrients wu determined together with age and the concentration of cholesterol in the blood. 
If P symbolizes pro em, Ffat, A age, and C cholesterol, the correlations are as follows: 


What is the correlation between age and cholesterol independent of the intake of protein 
and fat? Ans. v 

, ir , r ^£LZ Wcr* 0.3820 - (-0.2604)(- 0.3145) 

VU ~ ^p-f 2 )U - i-cp-r 2 ) Vd “ 0.2604 J X1 ~ 0.3145- 1 ) = 

EXAMPLE 13.8.4 Show that the sample estimate of the fraction of the variance of Y 
that is attributable to its linear regression on . . . X k is 

j (1 - R 2 )(n - 1) 

(n-k - 1) 


13.f — Three or more indepeiideiit variables. Computations. The 
formulas already described for two X-variables extend naturally to three 
or more X-variables. The computations inevitably become lengthier: 
they are ideally suited to an electronic computer. We shall describe one of 
the standard methods for a desk calculating machine — -the Abbreviated 
Doolittle method (10) — except that for clarity more steps are given than 
an experienced operator needs. For more extensive discussion of com- 
puting methods, see (11). 

With three independent variables, the normal equations are: 

+ b 2 Zx 1 x 2 + fe 3 Zx 1 x 3 = 
b 1 Ex 1 x 2 + b 2 Ex 2 2 + fc 3 Ex 2 x 3 = £x 2 y 
b 1 'Lx 1 x 3 + /> 2 Sx 2 x 3 4” Zj 3 !Lx 3 2 ^ Sx 3 y 

If the c’s are needed, as in most applications, the right sides become 
1, 0, 0 for Cj 2 , 3 : 0, 1, 0 for c 2 j, c 22 , c 2 3 , and 0, 0, 1 for c 3 c 32 , c 33 . 

Since the same calculating routine can be used for fc’s and c’s, only 
the right sides being different, we denote the unknowns by z u z 2 , z 3 , and 
let s tJ = Ex,x r The equations to be solved are : 


(1) 

S ll z l + S 12 Z 2 b S 13 z 3 “ 

(2) 

Sl2 z l *b $22 Z 2 "b ^23 z 3 ~ 

(3) 

S 13 z l + S 23 z 2 + S 33 Z 3 ~ 


The right side is not specified, since it depends on whether the b s or c s are 
being computed. 

The Doolittle method eliminates z l9 then z 1 and z 2 , solving for z 3 . 
Intermediate steps provide convenient equations for finding z 2 from z 3 , 
and finally z x from z 2 and z 3 . The computing routine can be carried out 
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without any thought as to why it works. . The explanation is given in this 
section. 

The first step, line (4), is to recopy line (1). 

(4) s ll^l + $12~2 + Sl3~3 ~ 

Now divide through by s t v It is quicker to find the reciprocal, 1 /s xu and 
multiply through by l/s 11 . This gives 



The coefficients of z 2 and z 3 have been bracketed, since they play a key 
role. Multiply (4) by s 12 /s n , obtaining 



s 12 z l 


, $12 2 _ , S 12 S 13 

t" 2 9 I 

Si i 5 U 


^3 = 


In steps (5) and (6) and in all subsequent steps, the right side of the 
equation is always multiplied by the same factor as the left side. Now sub- 
tract (6) from (2) to get rid of z x . 


( 7 ) 




S 12 S \ 3 ^ 

T7T 3 


1 


The next operations resemble those in lines (4) to (6). Find the reciprocal 
of {s 22 ~ ^i2 2 Au) an d multiply (7) by this reciprocal. 


( 8 ) 


+ 


S 23 ~ S 1 2 s 1 3/^1 1 1 „ 

^ 3 

^2 2 S l 2 fS l l ' 


The coefficient of r 3 in (8) receives a curl> bracket, like that of in (5) 
Reverting to (4) and (5), multiply (4) b> the bracketed v s u in (5). 


( 9 ) 


^13-1 + 


■Si >.s 


1 2*' n 


z 7 4* 


Similarly, multiply (7) by the bracketed coefficient of . , m ( 8 ). 


(10) 



s 1 2 s t n , ( s _ ^ s 1 ■» s \3 s 1 1 b 
>11 / 2 S 22 “ s i:~ s u 


Now take (3) — (9) — (10). Note that the coefficients of r x and z 2 both 
disappear, leaving an equation (11) with only r 3 on the left. Solve this for 
r 3 . (If there are four J-variables, continue through another cycle of these 
operations, ending with an equation in which r 4 alone appears.) 

Having z 3 , find r 2 from (8), and finally z l from (5) With familiarity, 
the operator will find that lines (6), (9), and (10) need not be written 
down when he is using a modern desk machine. 
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The next two sections give numerical examples of the calculation of 
the b ’s and c’s. The numbering of the lines and all computing instructions 
in these examples are exactly as in this section. 

. 13.10— Numerical example. Computing the d’s. In table 13.10.1 an 
additional independent variable X$ is taken from the original data in the 
plant-available phosphorus investigation. Like X 2 , the variable X$ mea- 
sures organic phosphorus, but of a different type. As before, Y is the 
estimated plant-available phosphorus in corn grown at Soil Temperature 
20°C. The data for Soil Temperature 35°C. are considered later. 


TABLE 13.10.1 

Phosphorus Fractions in Various Calcareous Soils, and Estimated Plant-available 
Phosphorus at Two Soil Temperatures 


Soil Sample 
No. 

Phosphorus Fractions 
m Soil, ppm* 

Estimated Plant-available 
Phosphorous in Soil, ppm 





Soil Temp. 

Soil Temp. 


i 



20° C. 

35° C. 


X x 

x* 

*3 

Y 

r 

1 

0.4 

53 

158 

64 

93 

2 

0.4 

23 

163 

60 

73 

3 

3.1 

19 

37 

71 

38 

4 

0.6 

34 

157 

61 

109 

5 

4.7 

24 

59 

54 

54' 

6 

1.7 

65 

123 

77 

107 

7 

9.4 

44 

46 

81 

99 

8 

10.1 

31 

117 

93 

94 

9 

11.6 

29 

173 

93 

66 

10 

12.6 

58 

112 

51 

126 

11 

10.9 

37 

111 

76 

75 

12 

23.1 

46 

114 

96 

108 

13 

23.1 

50 

134 

77 

90 

14 

21.6 

44 

73 

93 

72 

15 

23.1 

56 

168 

95 

90 

16 

1.9 

36 

143 

54 

82 

17 

26.8 

58 

202 

168 

128 

18 

29.9 

51 

124 

99 

120 


* X { — inorganic phosphorus by Bray and Kurtz method 
X 2 =■ organic phosphorus soluble m K 2 C0 3 and hydrolyzed by hypobromite 
X 3 = organic phosphorus soluble m K 2 CQ 3 and not hydrolyzed by hypobromite 


In general, regression problems in which the b ' s but not the c s are 
wanted are encountered only when the investigator is certain that all the 
JTs must be present in the regression equation and does not want to test 
indi\idual b t or compute confidence limits lor any /?,. The present exam- 
ple is a borderline case. A primary objective was to determine whether 
there exists an independent effect of soil organic phosphorus on the 
phosphorus nutrition of plants. That is, the investigators wished to 
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know ifX 2 and X 3 are related to Y after allowing for the relation between 
Y and X 1 (soil inorganic phosphorus). As a first step, we can work out the 
regression of Y on all three variables, obtaining the reduction in sum of 
squares of Y. The reduction due to a regression on X l alone is (Xx^y) 2 / 
Xx 2 . By subtraction, the additional reduction due to a regression on 
X 2 and X 3 is obtained. It can be tested against the Deviations mean square 
by an F-test. If Fis near 1, this probably settles the issue and the c’s are 
not needed. But if Fis close to its significance level; we will want to exam- 
ine b 2 and b 3 individually, since one type of inorganic phosphorus might 
sfiow an independent relation with Y but not the other. 


TABLE 13.10.2 

Solution of Three Normal Equations. Abbreviated Doolittle Method 


— , 

Line 1 

Reciprocal 

Instructions 



X 3 

Y 

a) 



4,752.96 

1,085.61 

1,200.00 

3,231.48 

(2) 



1,085.61 

3,155.78 

3,364.00 

2,216.44 

(3) 



1,200.00 

3,364.00 

35,572.00 

7,593.00 

(4) 


copy (1) 

1,752.96 

1,085.61 

1,200.00 

3,231.48 

(5) 

.0 3 570464 

(4) x ,0 3 570464 

1 

! 

[.61930] 

{.68456} 

1.84344 

(6) 


(4) x [.61930] 


672.32 

743.16 

2,001.26 

(7) 

' 

(2)-(6) 


* 2,483.46 

2,620 84 

215.18 

(8) 

,0 3 402664 

(7) x ,0 3 402664 


1 

{1 05532} 

.08665 

(9) 


(4) x {.68456} 



821.47 

2,212.14 

(10) 


(7) x {1.05532} 



2,765.82 

227.08 

(ID 


(3)-(9)-(10) 



31,984.71 

1 

5,153.78 


1 -r by 31,984.71 



b 3 — 

0.16113 



Line (8) 


= 08665 - 

(1 05532)Z? 3 = 

-0.08339 



Line (5) 


= 1.84344- 

- (.61930)^) 2 - 

( 68456)^ 3 


i L 




1 78478 


Reduction m SS = I b t (Xxy) = (1 78478 )( 3,231 48) + + (0 161 13)(7,593 00) 

= 6,806 


The normal equations and computation of the V s are in table 13.10.2. 
Before starting, consider whether some coding of the normal equations is 
advisable. If the sizes of the Xx 2 differ greatly, it is more difficult to 
keep track of the decimal places. Division or multiplication of some X's 
by a power of 10 will help. If X t is divided by 1 Q p , Xx 2 is divided by 10 2p 
and Xx t Xj or Xxj by 10 p . Note that b x is multiplied by 10 p and therefore 
I must be divided by 10 p in a final decoding. For practice, see example 
1 13.10.6. In this example no coding seems necessary. 

it is hoped that the calculations can be easily followed from the 
column of Instructions. In the equations like (5) in which the coefficient 
of the leading b x is 1, we carried five decimal places, usually enough w ith 
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three or four X-variables. Don’t forget that the b's are found in reverse 
order: b 3 , then b 2 , then b x . Since mistakes in calculation are hard to 
avoid, always substitute the b's in the original equations as a check, apart 
from rounding errors. At the end, the reduction in sum of squares of Y 
is computed. 

Table 13.10.3 gives the analysis of variance and the combined test of 
X 2 and X 3 . Since F= 1.06, it seems clear that neither form of inorganic 
phosphorus is related to Y in these data. 


TABLE 13.J0.3 

Analysis of Variance and Test of *2* *3 



Degrees of 

Sum of 

Mean 


Source of Variation 

Freedom 

Squares 

Square 

F 

Total 

17 

12,390 



Regression on X u X 2 , X 3 

3 

6,806 



Regression on X x 

1 

5,957* 



Regression on X 2 , X 3 after X x 

2 

849 

#24 

1.06 

Deviations 

14 

5,584 

399 



* (ZxrfY/ZxS = (3, 231. 48) 2 /l, 752.96 


Some general features of multiple regression may now be observed: 

1 . As noted before, the regression coefficients change with each new 
grouping of the X. With X 2 alone, b Y2 = 2,216.44/3,155.78 = 0.7023. 
Adding X u b Y2A = 0-0866. With three of the X, b Y2 , 13 = -0.0834. In 
any one multiple regression, the coefficients are intercorrelated ; either 
increasing or decreasing the number of JTs changes all the b' s. 

2. The value of E£ 2 never decreases with the addition of new X ; 
ordinarily it increases. Take X x alone; E^ 2 = (3,231 * 48) 2 /l, 752.96 
= 5,957. X x and X 2 make E$ 12 2 = 5,976. For all three, E$ 123 2 — 6,806. 
The increase may be small and nonsignificant, but it estimates the con- 
tribution of the added X . 

3. For checking calculations it is worth noting that E$ 2 cannot be 
greater than E y 2 ; nearly always it is less. Only if the X predict Y perfectly 
can Ej> 2 = Ey 2 . In that limiting case, Ed 2 = 0. 

4. High correlation between two of the X can upset calculations. 
If r l} is above 0.95, even 6 or 8 significant digits may not be sufficient to 
control rounding errors. Consider eliminating one of the two JTs.' 

5. If E$ 2 is only a small fraction of Ey 2 , that is, if R 2 is small, 
remember that most of the variation in Y is unexplained. It may be 
random variation or it may be due to other independent variables not 
considered in the regression. If these other variables were found and 
brought in, the relations among the Xs already included might change 
completely. 



408 Chapter 13: Multiple Regression 

EXAMPLE 13.10.1 — Compute the regression of plant-available phosphorus on the 
3 fractions. Ans. f = 1.7848* - 0.0834* 2 + 0.161 l* s + 43.67. 

EXAMPLE 13.10.2 — Estimate the plant-available phosphorus in soil sample 17 and 
compare it with the observed value. Ans. 1 19 ppm., Y - f = 49 ppm. 

EXAMPLE 13.10.3 — The experimenter might have information which would lead him 
to retain * 3 along with * in his predicting equation, dropping X 2 . Calculate the new regres- 
sion. Ans. f = 1,737*! + 0.155* 3 + 41.5. 

EXAMPLE 13.10.4 — Calculate the sum of squares due to X 2 after*! and * 3 . Ans. 16. 

EXAMPLE 13.10.5—Calculate R 2 — with * t alone, with X t and * 2 , and with 

*!, * 2 * v Ans. Ryi 2 - 0.4808, R y . 12 2 - 0.4823, R Y . l23 2 '= 0.5493. Notice th^ti? 2 never 
decrease'' with the addition of a new *; ordinarily it increases. Associate this with the cor- 
responding theorem about Xj> 2 . 

EXAMPLE 13.10.6 — In a multiple regression the original normal equations were as 
follows: 


Xx 

*2 

* 3 

Y 

1.28 

17.20 

85.20 

2.84 

17.20 

2,430.00 

7,1-60.00 

183.00 

85.20 

7,160.00 

67,200.00 

8,800.00 


It was decided to divide * 2 by 10 and * 3 by 1 00 before starting the solution What happens to 
LxjXj, Xx 2 y, Xx 2 x 3 , Xx 3 2 , Xx 2 2 , Xx 3 jy 9 Ans They become 0.852, 18.30, 7.16. 6.72, 24.30, 
88 . 00 . 

EXAMPLE 13.10.7 — In studies of the fertilization of red clover by honey bees (28), it 
was desired to*learn the effects of various lengths of the insects’ probosces. The measure- 
ment is difficult, so a pilot experiment was performed to determine a more convenient one 
that might be highly correlated with proboscis length Three measurements were tried on 
44 bees with the results indicated : 


n - 44 

Dry Weight, 
(mg.) 

Length of Wmg, 

* 2 (mm.) 

Width of Wing, 

* 3 (mm.) 

Length of 
Proboscis, 

Y (mm.) 

Mean 

13.10 

9.61 

3.28 

6.59 

Sum of Squares and Products 


Xx 

* 2 

*3 

Y 

Xx 

16 6840 

1.9279 

0.8240 

1.5057 

*2 


0.9924 

0.3351 

0.5989 




0.2248 

0.1848 

Y 




0 6831 


Coding is scarcely necessary. Carrying 5 decimal places, calculate the regression coefficients. 
Ans. 0.0292, 0.6151, -0.2022. 

EXAMPLE 13.10.8 — Test the significance of the overall regression and compute the 
value of R 2 . Ans. F — 16.2,/ = 3 and 40. P very small. R 2 0.55, a disappointing value 
when the objective is high accuracy in predicting Y. 
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EXAMPLE 13.10.9— Test the significance of the joint effect of Jcj and X y after fitting 
X 2 . Ans. F = 0.87. Can you conclude anything about the relative usefulness of the three 
predictors? 


13.11 — Numerical example. Computing the inverse matrix. Table 
13.11.1 gives the worksheet in which the c’s are computed. The comput- 
ing Instructions (column 2) are the same as in sections 13.9 and 13.10. 
The following points are worth noting: 

1. In many problems the c’s are small numbers with numerous zeros 
after the decimal place. For those who have difficulty in keeping track 
of the zeros the following pre-coding is recommended. Code each X h if 
necessary, so that every Xx,- 2 lies between 0.1 and 10. This can always 
be done by dividing X t by a power of 10. If X x is divided by 10 p and X 2 
by 10 r , then lx 2 is divided by I0 2p ; lx 2 2 by 10 2r ; and Xx,x 2 by 10 p+r 
In this example we had initially (table 13.10.2), lx x 2 = 1,752.96, Xx 2 2 
= 3,155.78, lx 3 2 = 35,572.00. Division of every X t by 10 2 make the' 
first two sums of squares lie between 0.1 and 1, while Xx 3 2 lies between I 
and 10 as shown' in table 13.11.1. Every Ex,X; is also divided by 10 4 . 
The advantage is that the coded c’s are usually not far from 1. Five 
decimal places will be carried throughout the calculations. 

2. The three sets of c’s are found simultaneously. The computations 
in column 6 give c u , c 12 , c 13 , those in column 7 give c l2 , c 12 , c 23 , and those 
in column 8 give c 13 , c 23 , c 33 . Because of the symmetry, quantities like 
c 12 are found only once. 

3. Column 9, the sum of columns 3 to 8, is a check sum. Since mis- 
takes creep in, check in each line indicated by a J that column 9 is the 
sum of columns 3 to 8. In some lines, e.g. (6), this check does not apply 
because of abbreviations in the method. 

4. The first three numbers found in line (12) are c 13 , c 23 , c 33 in coded 
form. Then we return to line (8). With column 7 as the right side, line 
(8) reads 

c 22 + 1.05533c 23 = 4.02658 

With column 6 as the right side, line (8) reads 

c 12 + 1.05533c 13 = -2.49358 

These give c 22 and c 12 . Finally, c n comes from line (5). 

5. To decode, c u is divided by the same factor by which 1 x,x ; 
was divided in coding. 

6. By copying the Ixj next to the c l} , the b t are easily computed. 
Then the reduction in S.S. due to regression and the Deviations mean 
square are obtained. These enable the standard error of each to be 
placed next to b v As anticipated, neither b 2 nor b 3 approaches the signifi- 
cance level. 

Occasionally there are several Y-variables whose sample regressions 
on the same set of Y-variables are to be worked out. In the phosphorus 
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12,390 : Reduction — £6,(X yj>) = 6,806 s 2 = 5,584/1 4 = 399 



experiment, com was grown in every soil sample at 35°C as well as at 
20°C. The amounts of phosphorus in the plants, Y\ are shown in the last 
column of table 13.10.1. Since the inverse matrix is the same for Y* as 
for 7, it is necessary to calculate only the new sums of products, 


y' = 1,720.42, 'LxJ = 4,337.56, = 8,324.00 


Combining these with the c’s already calculated, the regression coefficients 
for Y r are 


V = 0.1619, b 2 f = 1.1957, b $ ' = 0.1155 


In the new data, £j>' 2 = 6,426, Ld 2 = 12,390 - 6,426 = 5,964, s' 2 
= 426.0. The standard errors of the three regression coefficients are 0.556, 
0.431, and 0.115. These lead to the three values of t ; 0.29, 2.77, and 1 .00. 
At 35°C., b 2 is the only significant regression coefficient. The interpreta- 
tion made was that at 35°C. there was some mineralization of the organic 
phosphorus which would make it available to the plants. 

The formulas for the standard errors of the estimates in multiple 
regression studies are illustrated in examples 13. 1 1 .1 to 1 3. 1 13. 

EXAMPLE 13.11.1 — For soil sample 17, the predicted ¥ was 119 ppm, and the x t 
were: x x = 14.9, x 2 — 15.9, x 3 — 79. Find 95% limits for the population mean /z of Y. Am. 
The vanance of ¥ as an estimate of fi is 

s^|"~ + YLcaX 2 -f* IZcijXiXj 

The expression m the c's is conveniently computed as follows: 





AJ 


.0007249 

- 0002483 

- .0000010 

14.9 

.006774 

- .0002483 

.0004375 

- .0000330 

15.9 

.000650 

- .0000010 

- 0000330 

.0000313 

79.0 

.001933 

Xj 14.9 

15.9 

79.0 

'L'Lc ij X l Xj as 

0.2640 


Border the c tJ matrix with a row and a column of the v’s Multiply each row of the i tJ m 
turn by the x r giving the sums of products 0.006774, etc. Then multiply this column by the 
x n giving the sum of products 0 2640. Since n = 18 and s 2 « 399, this gives 

= (399) (0 0556 + 0 2640) » 127.5 • 11.3 

With t 0 05 - 2.145, the limits are 119 ± (2.145)(11.3); 95 to 143 ppm. 

EXAMPLE 13.11.2 — If we are estimating Y for an individual new observation, the 
standard error of the estimate ¥ is 

J |l+i + Zc ii x, J + 22 Wj Jl 


Verify that for a soil with the X-values of soil 17, the s.e. would be ±22.9 ppm. 
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EXAMPLE 13.11.3— The following data, kindly provided by Dr. Gene M Smith, 
come from a class of 66 students of nursing. Y represents the students’ score m an examina- 
tion on theory, X x the rank in high school (a high value being good), X 2 the score on a verbal 
aptitude test, and X 3 a measure of strength of character. The sums of squares and products 
(65 d.f.) are as follows : 


ZXiXj 

Tx t y 

sy 

24,633 

2,212 

5,865 

925.3 

670.3 


7,760 

2,695 

745.9 




28,432 

1,537.8 



(i) Show that the regression coefficients and their standard errors are as follows: 

b x *= 0.0206 ± 0.0192; b 2 = 0.0752 ± 0 0340, b 3 = 0.0427 ± 0.0180 
Which X variables are related to performance in theory 7 

(li) Show that the F value for the three-vanable regression is F — 5.50. What is the P 
value? 

(li) Verify that R 2 = 0.210. 

13.12 — Deletion of an independent variable. After a regression is 
computed, the utility of a variable may be questioned and its omission 
proposed. Instead of carrying out the calculations anew, the regression 
coefficients and the inverse matrix in the reduced regression can be 
obtained more quickly by the following formulas (14). We suppose 
that X u is the variable to be omitted from a regression containing 
X x . . . X k . Before omission, the Deviations mean square s 2 has 
(n - k - i) d.f. 

When X u is omitted, the sum of squares of deviations from the fitted 
regression, Xd 2 , is increased by b 2 Jc uu . The mean square of the deviations 
then becomes 


s' 2 — (Y,d 2 + b u 2 /c uu )/(n - k) 

Further, the regression coefficients and the inverse multipliers become 

b i b , Cutbu/Cuu 


C IJ C IJ C iu^ju/^uu 

13.13 — Selection of variates for prediction. A related but more diffi- 
cult problem arises when a regression is being constructed for purposes of 
prediction and it is thought that several of the A'-vanables, perhaps most 
of them, may contribute little or nothing to the accuracy of the prediction. 
For instance, we may start with 1 1 .Y-vanables, but a suitable choice of 
three of them might give the best predictions. The problem is to decide 
how many variables to retain, and which ones. 

The most thorough approach is to work out the regression of Y 
on every subset of the k V-variables, that is, on each variable singly, on 
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every pair of variables, on every triplet, and so on. The subset that gives 
the smallest Deviations mean square s 2 could be chosen, though if this 
subset involved 9 variables and another subset with 3 variables looked 
almost as good, the latter might be preferred for simplicity. The draw- 
back of this method is the amount of computation . The number of regres- 
sions to be computed is 2* — 1, or 2,047 for 1 1 2f-variables. Even with an 
electronic computer, this approach is scarcely feasible if k is large. 

Two alternative approaches are the step up method and the step down 
method . In the step down method , the regression of Y on all k JT-variables 
is calculated. The contribution of X t to the reduction in sum of squares of 
Y, after fitting the other variables, is b 2 /c u . The variable X u for which this 
quantity is smallest is selected, and some rule is followed in deciding 
whether to omit X u . One such rule is to omit X u if b u 2 /s 2 c m < l : others 
omit X„ if b u is not significant at some chosen level. If X u is omitted, the 
regression of Y on the remaining ( k - 1) variables is computed, and the 
same rule is applied. The process continues until no variable qualifies 
for omission. 

In the step up method we start with the regressions of Ton X l9 . . . X k 
taken singly. The variable giving the greatest reduction in sum of squares 
of Y is selected. Call this X x . Then the bivariate regressions in which X x 
appears are worked out. The variate which gives the greatest additional 
reduction in sum of squares after fitting X x is selected. Call this X 2 - AH 
trivariate regressions that include both X x and X 2 are computed, and the 
variate that makes the greatest additional contribution to them is selected, 
and so on until this additional contribution b 2 /c^ is too small to satisfy 
some rule for inclusion. 

It is known that the step up and the step down methods will not neces- 
sarily select the same X-variables, and that neither method guarantees to 
find the same variables as the exhaustive method of investigating every 
subset. Striking differences appear mainly when the X-variables are highly 
correlated- The differences are not necessarily alarming, because when 
intercorrelations are hjgh, different subsets can give almost equally good 
predictions. Fuller accounts of these methods, with illustrations, appear 
in (15, 16). 

Two aspects of this problem require further research. For a given 
approach, e.g., the step down method, the best rule to use in deciding 
whether to omit an Z-variate is not clear. Naturally, all simple rules 
reject X x if at some stage b x 2 /c xi is small enough. Suppose that = + 1 . 
Then X x may be rejected because this sample gave an unusually low esti- 
mate of b v say 0.3. Nevertheless, with p t = +1 a prediction formula that 
includes a term G.3J^ may give better predictions in the population than 
one which has no term in X v For this reason some writers recommend 
retaining the term in A r t if the investigator is confident from his knowledge 
of the mechanism involved that fi x must be positive and if b t is also posi- 
tive. 

Secondly these methods tend to select variables that happen to do 
unusually well m the sample. When applied to new material, a prediction 



414 Chapter 13: Multiple Regression 

formula selected in this way will not predict as accurately as the value of 
s 2 suggests, especially if the sample is small and many JSTs have been 
rejected. More information is needed on the extent of this loss of ac- 
curacy. 

13.14 — The discriminant function. This is a multivariate technique 
for studying the extent to which different populations overlap one another 
or diverge from one another. It has three principal types of use. 

1. Classification and diagnosis. The doctor’s records of a person’s 
symptoms and of his physical and laboratory measurements are taken 
to guide the doctor as to the particular disease from which the person is 
suffering. With two diseases that are often confused^it is helpful to learn 
what measurements are most effective in distinguishing between the 
conditions, how best to combine the measurements, and how successfully 
the distinction can be made. 

2. In the study of the relations between populations . For example, 
to what extent do the aptitudes and attitudes of a competent architect 
differ from those of a competent engineer or a competent banker? Do 
non-smokers, cigarette smokers, pipe smokers, and cigar smokers differ 
markedly or only negligibly in their psychological traits? 

3. As a multivariate generalization of the t-test. Given a number of 
related measurements made on each of two groups, the investigator may 
want a single test of the null hypothesis that the two populations have the 
same means with respect to all the measurements. 

Historically, it is interesting that the discriminant function was de- 
veloped independently by Fisher (17), whose primary interest was in 
classification, by Mahalanobis (18), in connection with a large study of 
the relations between Indian castes and tribes, and by Hotelling (19), 
who produced the multivariate /-test. 

This introduction is confined to the case of two populations. Con- 
sider first a single variate X , normally distributed, with known means p u 
p 2 in the two populations and known standard deviation a, assumed 
the same in both populations. The value of X is measured for a new speci- 
men that belongs to one of the two populations. Our task is to classify 
the specimen into the correct population. If a natural classifica- 

tion rule is to assign the specimen to population / if X < -f t 
and to population II if X > (ji x + /* 2 )/2. The mean of the two popula- 
tions serves as the boundary point. 

How often will we make a mistake? If the specimen actually comes 
from population /, our verdict is wrong whenever X > (jjl x + ^ 2 )/2; that 
is, whenever 


X~p x 

V 


th + £ 2 
2 


~ Hi 


a 


(.Hz ~ Hi) _ 

2a 2a 


where 8 = (n 2 — Hi) is the distance between the two means. 
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Since (X — p^/a follows the standard normal distribution, the prob- 
ability of misclassification is the area of the normal tail from 5/2a to go. 
It is easily seen that the same probability of misclassification holds for a 
specimen from population II. Some values of this probability for given 
5/a are as follows: 


d/a 

0.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

Probability (%) 

40.1 

30.8 

22.7 

15.9 

10.6 

6.7 

40 

2.3 


For a high degree of accuracy in classification, 5/a must exceed 3. The 
same quantity 5/a can be used as an index of the degree of overlap be- 
tween the two populations: it is sometimes called the distance between 
the populations . 

In some classification problems it is known from experience that 
specimens come more frequently from one population than from the 
other. Further, misclassifying a specimen that actually comes from popu- 
lation I may have more serious consequences than misclassifying a speci- 
men from population II. If these relative frequencies and relative costs 
of mistakes are known, the boundary point is shifted to a value at which 
the average cost of mistakes is minimized (20). 

We come now to the multivariate case. The variates X 1 . „ . X k are 
assumed to follow a multivariate normal distribution. The variance 
a H of X t and the covariance a i} of X t and X } are assumed to be the same in 
both populations. Of course, a ti is not assumed to be the same from one 
variate to another, nor a ^ from one pair of variates to another. The 
symbol 5 t = p 2i - p u denotes the difference between the means of the 
two populations for X v 

, The linear discriminant function EL, AT* may be defined as the linear 
function of the X t that gives the smallest probability of misclassification. 
The L t are coefficients that will be determined to order to satisfy this re- 
quirement. Since the X t follow a multivariate normal, it is known from 
theory that EL f X £ is normally distributed. The difference between its 
means in the two populations is 5 = and its variance is 

a 2 ^ E ILLiLf ij. 

From the earlier discussion for a single variate, it is clear that we 
must maximize the absolute value of 5/a in order to minimize the probabil- 
ity of misclassification. To avoid the question of signs, the L f are chosen 
so as to maximize <5 2 /a 2 ; that is, 

A 2 = (LL l 5/) 1 /YLL i L j a ii (13.14.1) 

The quantity A 2 is called the generalized squared distance . By calculus 
the L, are found to be the solutions of the set of k equations 

a n L x + <t 12 L 2 + . . . + k^k = <5i 
(13.14.2) 

a ki L x + cf k2 L 2 + . . . + cr fcfc L k = <5 k 
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An interesting consequence of the solution is that A 2 = EL A when the 
optimum L, from (13.14.2) are inserted. 

The estimation of the linear discriminant function from sample data is 
illustrated in section 13.15 below. 

13.15 — Numerical example of the discrqninant function. This exam- 
ple, due to Cox and Martin (22), uses data from a study of the distribution 
of Azotobacter in Iowa soils. The question is : how well can the presence 
or absence of Azotobacter be predicted from three chemical measurements 
on the soil ? The measurements are : 

= soil pH 

X 2 - amount of readily available phosphate 

X 3 = total nitrogen content 

The data consist of 100 soils containing no Azotobacter and 186 soils 
containing Azotobacter. For ease in calculation, the data were coded by 
dividing by 10, 1,000, and 100, respectively. The original data 

will not be given here. 

It is always advisable to look first at the discriminating powers of 
the individual variates. The Within Sample mean squares s 2 (284 d.f.) 
and the d t (differences between the sample means) were computed for 
each variate. The ratios d/s were 2.37 for X l9 1.36 for X 2 , and 0.81 for 
X 3 . Evidently X t is the best single variate, giving a probability of mis- 
classification of 1 1 .8%, while X 3 is poor by itself. A result worth noting is 
that if the variates were independent, t he value of d/s given by the dis- 
criminant function would be simply s J{'L(dJs l ) 2 } 9 or in this example 
^8.12 = 2.85, with an error rate of about 7.7%. In practical applications, 
correlations between the X's usually have the effect of making the dis- 
criminant function less accurate (21). If the computed discriminant 
appears to give a d/s much greater than the value obtained by assuming 
independence, the computations should be checked. 

To compute the discriminant, find the pooled Within Sample sums of 
squares S h and sums of products Sy. If the sample sizes are n u n 2 , the 
degrees of freedom are (n 1 + n 2 — 2). In line with equations (13.14.2) 
the normal equations to be solved are as follows: 


$ 11^1 4- S 12 L 2 4 ... 4 S lk L k — d 1 


Ski^i 4 S k2 L 2 4 ... 4 SfcjfcL* — d k 


(13.15.1) 


(If we were to copy [13.14.2] as closely as possible, the mean squares and 
products s t - would be used m [13.15.1] instead of the S lJ9 but the S tJ 
give the same results in the end and are easier to use.) 

Equations (13.15.1) obviously resemble the normal equations for the 
regression coefficients m multiple regression. The L t take the place of the 
h „ and the d t of the 1x u v. The resemblance can be increased by con- 
structing a dummy variable F, which has the value 4 l/n 2 for every mem- 
ber of sample 2 and - 1 /n 1 for every member of sample 1. It follows that 
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~LXiY = Sxjv = d t . Thus, formally, the discriminant function can be 
regarded as the multiple regression of this dummy Y on . . . X k . If 
we knew Y for any specimen we would know the population, to which the 
specimen belongs. Consequently, it is reasonable that the discriminant 
function should try to predict Y as accurately as possible. 

For the two sets of soils the normal equations are : 

1.111 Li + 0.2291-2 + 0.198Z.3 = 0.1408 
0.229Z-! + 1.0431,2 + 0.0511,3 = 0.0821 
0.198Z-! + 0.0511-2 + 2.9421-3 = 0.0826 
The L h computed by the method of section 13.10, are: 

-£-i “ 0.11229, 1-2 = 0.05310, -L 3 = 0.01960 
The value of d/s for the discriminant is given by the formula: 

V(« t + « 2 - 2)2L,d, = V(284)(0.02179) = 7^188 = 2.49 

This gives an estimated probability of misclassification of 10.6%. In 
these data the combined discriminant is not much better than pH alone. 


TABLE 13.15.1 

Analysis of Variance of the Discriminant Function. Hotelling's r 2 ~ test 



Degrees of 


Mean 

Source of Variation 

Freedom 

Sum of Squares 

Square 

Between soils 

3 

n lL n 2 CZLd) 2 /(n l + n 2 ) = 0.03088 

0.01029 

Within soils 

| 282 

ZLd = 0.021 79 

0.0000773 


0.01029/0.0000773 = 133.1. 


The multivariate /-test. Hotelling’s T 2 test, is made in table 13.15.1 
from an analysis of variance of the variate into “Between Samples” 

and “Within Samples.” On multiplying equations (13.15.1) by L i9 
L 29 . . . L k and adding, we have the result: 

Within Samples sum of squares = YLL i L j S ij = 

The “Between Samples” sum of squares = n 1 n 2 (ZL i d l ) 2 /(n 1 4- n 2 ) 

Note the d.f . : k for Between Samples and (n x + n 2 - k ~ 1) for Within 
Samples. The allocation of k d.f. to Between Samples allows for the fact 
that the U s were chosen to maximize the ratio of the Between Samples 
S.S. to the Within Samples S.S. The value of F, 133.1, with 3 and 2S2 
d.f. is very large, as it must be if the discriminant is to be effective in classifi- 
cation. 

The assumption that the covariance matrix is the same in both popu- 
lations is rather sweeping. If there appear to be moderate differences 
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between the matrices in the two populations and if n x and n 2 are unequal, 
it is better when computing the coefficients L { to replace the sums of 
squares and products by the unweighted averages of the variances 
or covariances in the two samples. If this is done, note that the value of 
d/s for the discriminant becomes s J'E(Ld), while in table 13.15.1, S(L<af) 
becomes the Within Samples mean square . The expression for the Be- 
tween Samples sum of squares remains as in table 13.15.1. When the 
covariance matrices differ substantially, the best discriminant is a quad- 
ratic expression in the X's. Smith (23) presents an example of this case. 

For classification studies involving more than two populations, see 
Rao (20). Examples are given in (24, 25) for qualitative data, in which the 
assumption of a multivariate normal population does not apply. 
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★ CHAPTER FOURTEEN 


A 

-/jLnalysis of covariance 


14.1 — Introduction. The analysis of covariance is a technique that 
combines the features of analysis of variai ;e and regression. In a one- 
wav, classification, the typical analysis of variance model for the value 
Yij of the jth observation in the rth class is 

'Y tj = ^ + eu 

where the p x represent the population means of the classes and the e t j are 
the residuals. But suppose that on each unit we have also measured 
another variable X l} that is linearly related to Y ir It is natural to set up the 
model, 

where p is the regression coefficient of Y on X . This is a typical model 
for the analysis of covariance. If X and Y are closely related, we may 
expect this model to fit the Y tJ values better than the original analysis of 
variance model. That is, the residuals s XJ should be in general smaller 
than the e iy 

The model extends easily to more complex situations. With a two- 
way classification, as in a randomized blocks experiment, the model is 

Yij = fi + CLi + pj H- p(X XJ — X..) + 

With a one-way classification and two auxiliary variables and Xiv 
both linearly related to Y XJ , we have 

Y XJ = + Pi(X Uj “ Xx'.) + PiiXhj ~~ %2") + &ij 

The analysis of covariance has numerous uses. 

1 . To increase precision in randomized experiments, in such applica- 
tions the covariate X is a measurement, taken on each experimental unit 
before the treatments are applied, that predicts to some degree the final 
response Y on the unit. In the earliest application suggested by Fisher 
(1), the Y l} were the yields of tea bushes in an experiment. An important 

419 
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source of error is that by the luck of the draw, some treatments will have 
been, allotted to a more productive set of bushes than others. The X i} 
were the previous yields of the bushes in a period before treatments were 
applied. Since the relative yields of tea bushes show a good deal of sta- 
bility from year to year, the X xj serve as predictors of the inherent yielding 
abilities of the bushes. By adjusting the treatment mean yields so as to 
remove these differences in yielding ability, we obtain a lower experi- 
mental error and more precise comparisons among the treatments. This 
is probably the commonest use of covariance. 

2. To adjust for sources of bias in observational studies . An investi- 
gator is studying the relation between obesity in workers and the physical 
activity required i$ their occupations. He has measures of obesity Y tj in 
samples of workers from each of a number of occupations. He has also 
recorded the age X xj dt each worker, and notices that there are differences 
between the mean ages of the workers in different occupations. If obesity 
is linearly related to age, differences found in obesity among different 
occupations may be due in pan to these age differences. Consequently he 
introduces the term p(X XJ — X..) into his model in order to adjust for a 
possible source of bias in his comparison among occupations. 

3. To throw light on the nature of treatment effects in randomized 
experiments. In an experiment on the effects of soil fumigants on nema- 
todes, which attack some farm crops, significant differences between 
fumigants were found both in the numbers of nematode cysts X xJ and in 
the yields Y tj of the crop. This raises the question : Can the differences in 
yields be ascribed to the differences in numbers of nematodes? One way 
of examining this question is to see whether treatment differences in yields 
remain, or whether they shrink to insignificance, after adjusting for the 
regression of yields on nematode numbers. 

4. To study regressions in multiple classifications . For example, an 
investigator is studying the relation between expenditure per student in 
schools ( Y) and per capita income ( X ) in large cities. If he has data for 
a large number of cities for each of four years, he may want to examine 
whether the relation is the same in different sections of the country, or 
whether it remains the same from year to year. Sometimes the question 
is whether the relation is straight or curved. 

14.2 — Covariance in a completely randomized experiment. We begin 
with a simple example of the use of covariance in increasing precision in 
randomized experiments. With a completely randomized design, the 
data form a one-way classification, the treatments being the classes. In 
the model 


fj — /A + P(X XJ — X..) + e lJ5 

the p x represent the effects of the treatments. The observed mean for the 
/th treatment is 
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Thus Y t . is an unbiased estimate of 
It follows that as an estimate of ji t we use 

the second term on the right being the adjustment introduced by the co- 
variance analysis. The adjustment accords with common sense. For 
instance, suppose we were told that in the previous year the tea bushes 
receiving Treatment 1 yielded 20 pounds more than the average over the 
experiment. If the regression coefficient of Y on X was 0.4, meaning 
that each pound of increase in X corresponds to 0.4 pound of increase in 
Y, we would decrease the observed Y mean by (0.4)(20) = 8 pounds in 
order to make Treatment 1 more comparable to the other treatments. In 
this illustration the figure 0.4 is p and the figure 20 is (J f . - X ..). 

There remains the problem of estimating p from the results of the 
experiment. In a single sample you may recall that the regression coeffi- 
cient is estimated by h — Xxy/Xx 2 , and that the reduction in sum of 
squares of Y due to the regression is (I xyf/'Zx 2 . These results continue 
to hold in multiple classifications (completely randomized, randomized 
blocks and Latin square designs) except that ft is estimated from the Error 
line in the analysis of variance. We may write h = E xy /E xx . The Error 
sum of squares of X in the analysis of variance, E xx , is familiar, but the 
quantity E xy is new. It is the Error sum of products of X and Y. A 
numerical example will clarify it. 

The data in table 14.2.1 were selected from a larger experiment on 
the use of drugs in the treatment of leprosy at the Eversley Childs Sani- 
tarium in the Philippines. On each patient six sites on the body at which 
leprosy bacilli tend to congregate were selected. The variate X, based on 
laboratory tests, is a score representing the abundance of leprosy bacilli 
at these sites before the experiment began. The variate Y is a similar score 
after several months of treatment. Drugs A and D are antibiotics while 
drug F is an inert drug included as a control. Ten patients were selected 
for each treatment for this example. 

The first step is to compute the analysis of sums of squares and 
products, shown under the table. In the columns headed Xx 2 and Xy 2 , we 
analyze X and Y in the usual way into “Between drugs” and “Within 
drugs.” For the Xxy column, make the corresponding analysis of the 
products of X and Y, as follows : 

Total: (H)(6) + (8)(0) + . . . + (12)(20) - (322)(237)/30 = 731.2 

D , , (93)(53) -f ( 1 00)(6 1 ) 4- (129)( 1 23) (322)(237) 

Between drugs: ~~ — - — - — = 14r>.8 
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TABLE 14.2.1 

Scores for Leprosy Bacilli Before (X) and After ( Y) Treatment 


Drugs 


A D F 


Totals 

Means 


X Y 

11 6 

8 0 

5 2 

14 8 

19 11 

6 4 

10 13 

6 1 

11 8 

3 0 


X 

Y 

6 

0 

6 

2 

7 

3 

8 

1 

18 

18 

8 

4 

19 

14 

8 

9 

5 

1 

15 

9 


X 

Y 

16 

13 

13 

10 

11 

18 

9 

5 

21 

23 

16 

12 

12 

5 

12 

16 

7 

1 

12 

20 


Overall 


' X 
322 
10.73 


93 53 100 61 129 123 

9.3 5.3 10.0 6.1 12.9 12.3 


Y 

237 

7.90 


Analysis of Sums of Squares and Products 


Source 

d.f. 

Xx 2 

Yxy 

Xy 2 

Total 

29 

665.9 

731.2 

1,288.7 

Between drugs 

2 

73.0 

145.8 

293.6 

Within drugs (Error) 

27 

592.9 

585.4 

995.1 

Reduction due to regression 

1 


(585.4)7592.9 

= 578.0 ‘ 

Deviations from regression 

26 



417.1 


Deviations mean square = 417.1/26 = 16.04 


The Within drugs sum of products, 585 . 4 , is found by subtraction. Note 
that any of these sums of products may be either positive or negative. 
The Within drugs (Error) sum of products 585.4 is the quantity we call 
E xy , while the Error sum of squares of X , 592 . 9 , is E xx . 

The reduction in the Error sum of squares Y due to the regression is 
E xy 2 /E xx with I d.f. The Deviations mean square, 16.04 with 26 df, 
provides the estimate of error. The original Error mean square of Y is 
995.1/27 = 36.86. The regression has produced a substantial reduction 
in the Error mean square. 

The next step is to compute b and the adjusted means. We have 
b = E xy /E xx = 585.4/592.9 = 0.988. The adjusted means are as follows: 

A: Yi--b(X 5.3 — (0.988)6 9.3 — 10.73) = 6.71 

D: Y 2 .-b{X 2 .-X. .) = 6.1 -(0.988X10.0 - 10.73)= 6.82 

F: Y3*-b(X 3 .- X..)= 12.3 - (0.988)(12.9 - 10.73) = 10.16 


have improved the status of F, which happened to receive initially a set 
of patients with somewhat high scores. 
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For tests of significance or confidence limits relating to the adjusted 
means, the error variance is derived from the mean square s y . x 2 = 16.04, 
with 26 d.f. Algebraically, the difference between the adjusted means of 
the z'th and the jth treatments is 

D=%- Yj. - b(X r - Xj.) 

The formula for the estimated variance of D is 



- M 


(14.2.1) 


where n is the sample size per treatment. The second term on the right is 
an allowance for the sampling error of b. 

This formula has the disadvantage that s D is different for every pair 
of treatments that are being compared. In practice, these differences are 
small if (i) there are at least 20 d.f. in the Error line of the analysis of vari- 
ance, and (ii) the Treatments mean square for X is non-significant, as it 
should be since the X’s were measured befpre treatments were assigned. 
In such cases an average value of s D 2 may be used. By an algebraic identity 
(2) the average value of s D 2 , taken over every pair of treatments, is 




(14.2.2) 


where t xx is the Treatments mean square for X. 
regard 



More generally, we may 


(14.2.3) 


as the effective Error mean square per observation when computing the 
error variance for any comparison among the treatment means. 

In this experiment t xx = 73.0/2 = 36.5 (from table 14.2.1), E xx = 592.9, 
giving t xx /E xx -= 0.0616. Hence, 


s' 2 = (16.04)(1.0616) = 17.03 : s' ~ 4.127 


With 10 replicates this gives = 4.127^(0.2) = 1.846. The ad- 
justed means for A and D, 6.71 and 6.82, show no sign of a real difference. 
•The largest contrast, F — A, is 3.45, giving a f- value of 3.45/1.846 = 1.87, 
with 26 d.f, which is not significant at the 5% level. 

After completing a covariance analysis, the experimenter is sure to 
ask: Is it worthwhile? The efficiency of the adjusted means relative to the 
unadjusted means is estimated by the ratio of the corresponding effective 
Error mean squares : 
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Covariance with 10 replicates per treatment gives nearly as precise 
estimates as the unadjusted means with 21 replicates. 

In experiments like this, in which X measures the same quantity as 
Y (score for leprosy bacilli), an alternative to covariance is to use (Y — X), 
the change in the score, as the measure of treatment effect. The Error 
mean square for (7 — X) is obtained from* table 14.2.1 as 

E yy - 2 E xy + E xx __ [995.1 - 2(585.4) + J92.9] _ 

27 27 

This compares with 17.03 for covariance. In this experiment, use of 
( Y — X) is slightly more efficient than covariance as well as quicker com- 
putationally. This was the recommended variable for analysis in the 
larger experiment from which these data were selected. In many experi- 
ments, (Y - X) is inferior to covariance, and may also be inferior to Y 
if the correlation between X and Y is low. 

143 — The F-test of the adjusted means. Section 14.2 has shown how 
to make comparisons among the adjusted means. It is also possible to 
perform an F-test of the null hypothesis that all the pL t are equal — that 
there are no differences among the adjusted means. Since the way in which 
this test is computed often looks mystifying, we first explain its rationale. 

First we indicate why b is always estimated from the Error line of the 
analysis of variance. Suppose that the value of b has not yet been chosen. 
As we have seen, the analysis of covariance is essentially an analysis of 
variance of the quantity ( Y - bX ). The Error sum of squares of this 
quantity may be written 


E yy - 2bE xy + b 2 E x , 
Completing the square on b, the Error S.S. is 



+ E yy 



(14.3.1) 


By the method of least squares, the value of b is selected so as to minimize 
the Error S.S From (14.3.1), it is obvious that this happens when 
b = E xv /E xx , the minimum Error S.S. being E yv - E xy 2 /E xx . 

Now to the F-test. If the null hypothesis is true, a covariance model 
in which ju, = /r should fit the data as well as the original covariance model. 
Consequently, we fit this H 0 model to find how large an Error S.S. it 
gives. In the analysis of sums of squares and products for the H 0 model, 
the “Error” line is the sum of the Error and Treatments line m the original 
model, because the H 0 model contains no treatment effects. Hence, the 
Deviations S.S. from the H 0 mode! is 


(iW + %y ) 2 
Exx + T xx 


£ vv + T vv 


(14.3.2) 
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If H 0 holds, the difference between the Deviations 5.5. for the H 0 
model and the original model, when divided by the difference in degrees 
of freedom, may be shown to be an estimate of <r rx 2 in the original model. 
If H 0 is false, this mean square difference becomes large because the H 0 
model "fits poorly. This mean square difference forms the numerator 
of the F-test. The denominator is the Deviations mean square from the 
original model. 

In table 14.3.1 the test is made for the leprosy example. The first 
step is to form a Treatments 4- Error line. (In a completely randomized 
design this line is, of course, the same as the Total line, but this is not so 
in randomized blocks or a Latin square.) Following formula (14.3.2) we 
subtract (731.2) 2 /665.9 = 802.9 from 1288.7 to give the deviations 5.5., 
485.8, for the H 0 model. From this we subtract 417.1, the deviations 
5.5. for the original model, and divide by the difference in d.f *, 2. The 
F’-ratio, 34.35/16.04 = 2.14, with 2 and 26 d /., lies between the 25% and 
the 10% levels. 


TABLE 14.3.1 

The Covariance F-test in a One-Way Classification. Leprosy Dai a 



Degrees of 
Freedom 

lx 2 

Xxy 

i 

Red 

Deviations From Regression 

Degrees of Sum of Mean 
Freedom Squares Square 

Treatments 

2 

73 0 

145 8 

293 6 1 





Error 

27 

592 9 

585.4 

995 1 

578.0 ! 

1 

26 

417.1 

, 16.04 

7 + E 

29 

665 9 

731.2 

1,288.7 , 

802.9 

28 

485.8 

• 

t 

1 

2 

68.7 

34.35 


14.4 — Covariance in a two-way classification. The computations in- 
volve nothing new. The regression coefficient is estimated from the Error 
(Treatments x Blocks) line in the analysis of sums of squares and prod- 
ucts, and the .F-test of the adjusted treatment means is made by recomput- 
ing the regression from the Treatments plus Error lines, following the 
procedure in section 14.3. To put it more generally for applications in 
which the words “Treatments” and “Blocks” are inappropriate, the 
regression coefficient is estimated from the Rows x Columns line, and 
either the adjusted row means or the adjusted column means may be 
tested. Two examples from experiments will be presented to illustrate 
points that arise in applications. 

The data in table 14.4.1 are from an experiment on the effects of 
two drugs on mental activity (1 3). The mental activity score was the sum 
of the scores on seven items in a questionnaire given to each of 24 volunteer 
subjects. The treatments were morphine, heroin, and placebo (an inert 
substance), given in subcutaneous injections. On different occasions, each 
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TABLE M.4.1 

Mental Activity Scores Before ( X ) and Two Hours After ( T) a Druc 



Morphine 

Heroin 

Placebo 


Total 

Subject 

X 

Y 

! X 

Y 

X 

Y 

X 

Y 

1 

7 

4 

0 

2 

0 

1 

1 

13 

2 

2 

2 

4 

0 

2 

1 

8 

3 

3 

14 

14 

! 14 

13 

14 

10 

42 

37 

4 

14 

0 

10 

0 

5 

10 

29 

10 

5 

1 

2 

4 

0 

5 

6 

10 

8 

6 

2 

0 

5 

0 

4 

2 

13 

2 

7 

5 

6 

6 

1 

8 

7 

19 

14 

8 

6 

0 

6 

2 

6 

5 

18 

7 

9 

5 

1 « 

4 

0 

6 

6 

15 

7 

10 

6 

6 

10 

0 

8 

6 

24 

12 

11 

7 

5 

7 

2 

6 

3 

20 

10 

12 

1 

3 

4 

1 

3 

8 

8 

12 

13 

0 

0 

1 

0 

1 

0 

, 2 

0 

14 

8 

10 

9 

1 

10 

11 

’27 

22 

15 

8 

0 

4 

13 

10 

10 

22 

23 

16 

0 

0 

0 

0 

0 

0 

0 

0 

17 

11 

1 

11 

0 

10 

8 

32 

9 

18 

6 

2 

6 

4 

6 

6 

18 

12 

19 

7 

9 

0 

0 

8 

7 

15 

16 

20 

5 

0 

! 6 

1 

5 

1 . 

16 

2 

21 

4 

2 

11 

5 

10 

8 

25 

15 

'22 

7 

7 

7 

7 

6 

5 

20 

19 

23 

0 

2 

0 

0 

0 

1 

0 

3 

24 

12 

12 

12 . 

0 

11 

5 

35 

17 

Total 

138 

88 

141 

52 

144 

133 

■ 

423 

273 



Degrees of 








Freedom 

I* 2 


Zxy 



Between subjects 

23 


910 


519 


558 

Between drugs 


2 


1 


5 


137 

Error 


46 


199 


-16 


422 

Total 

71 


1,110 


508 


1,117 


subject received each drug in turn. The mental activity was measured 
before taking the drug (X) and at 1/2, 2, 3, and 4 hours after. * The re- 
sponse data (F) in table 14.4.1 are those at two hours after. As a com- 
mon precaution in these experiments, eight subjects took morphine first, 
eight took heroin first, and eight took the placebo first, and similarly on 
the second and third occasions. In these data there was no apparent 
effect of the order in which drugs were given, and the order is ignored in 
the analysis of variance presented here. 

In planning this experiment two sources of variation were recog- 
nized. First, there are consistent differences in level of mental activity 
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between subjects. This source was removed from the experimental error 
by the device of having each subject test all three drugs, so that com- 
parisons between drugs are made within subjects. Secondly, a subject’s 
level changes from time to time — he feels sluggish on some occasions 
and unusually alert on others. Insofar as these differences are measured 
by the pretest mental activity score on each occasion, the covariance 
analysis should remove this source of error. 

As it turned out, the covariance was ineffective in this experiment. 
The error regression coefficient is actually slightly negative, b = - 1 6/199, 
and showed no sign of statistical significance. Consequently, comparison 
of the drugs is best made from the 2-hour readings alone in this case. 
Incidentally, covariance would have been quite effective in removing 
differences in mental activity between subjects, since the Between sub- 
jects b, 519/910, is positive and strongly significant. 

Unlike the previous leprosy example, the use of the change in score, 

2 hours — pretest, would have been unwise as a measure of the effects of 
the drugs. From table 14.4.1 the Error sum of squares for ( Y — X) is 

422 -f 199 — 2( — 16) = 653 

This is substantially larger than the sum of squares, 422, for Y alone. 

The second example, table 14.4.2, illustrates another issue (3). The 
experiment compared the yields Y of six varieties of corn. There was some 
variation from plot to plot in number of plants (stand). If this variation 
is caused by differences in fertility in different plots and if higher plant 
numbers result in higher yields per plot, increased precision will be ob- 
tained by adjusting for the covariance of yield on plantnumber. The plant 
numbers in this event serve as an index of the fertility levels of the plots. 
But if some varieties characteristically have higher plant numbers than 
others through a greater ability to germinate or to survive when the plants 
are young, the adjustment for stand distorts the yields because it is trying 
to compare the varieties at some average plant number level that the 
varieties do not attain in practice. 

With this in mind, look first at the F-ratio for Varieties in X (stand). 
From table 14.4.2 the mean squares are: Varieties 9.17, Error 7.59, giving 
F — 1 .21. The low value of F gives assurance that the variations in stand 
are mostly random and that adjustment for stand will not introduce bias. 

In the analysis, note the use of the Variety plus Error line in comput- 
ing the F-test of the adjusted means. The value of Fis 645.38/97.22 = 6.64, 
highly significant with 5 and 14 d.f. The adjustment produced a striking 
decrease in the Error mean square, from 583.5 to 97.2, and an increase in 
Ffrom 3.25 to 6.64. 

The adjusted means will be found to be: 

A, 191.8; B, 191.0; C, 193.1; D, 219.3; £,189.6; £,213.6 

The standard error of the difference between two adjusted means is 7.25, 
with 14 d.f. By either the LSD method or the sequential Newman-Keuls 
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TABLE 14.4.2 

Stand (X) and Yield (Y) (Pounds Field Weight of Ear Corn) of Six Varieties of 
Corn. Covariance in Randomized Blocks * 



; Blocks 

Total 



1 


2 

3 


4 

Varieties 


X- 

Y 

X 

Y 

4 

X Y 

X 

Y 


X 

Y 

A 


28 

202 

22 

165 

21 191 

19 

134 


96 

692 

3 


23 

145 

26 

201 1 

28 203 

24 

180 


101 

729 

C 


27 

188 

24 

185 

27 185 

28 

220 


106 

778 

D 


24 

201 

28 

231 

30 238 

30 

261 


112 

931 

E 


30 

202 

26 

178 

26 198 

29 

226 


111 

804 

F 


30 

228 

25 

221 

27 207 

24 

204 


106 

860 

Total 

162 

1,166 

151 

1,181 

165 1,222 

154 

1,225 


632 

4,794 


Deviations From Regression 

Source of 








Sum of 


Mean 

Variation 



d.f. 

Zx 2 

Y,xy 

Zy 2 

d.f. 

Squares 

Square 

Total 



23 

181.33 

1,485.00 

18,678.50 






Blocks 



3 

21.67 

8.50 

436.17 






Varieties 



5 

45.83 

559.25 

9,490.00 






Error 



15 

113.83 

917.25 

8,752.33 

14 

1,361.07 


97.22 

Variety plus error 

20 

159.66 

1,476.50 

18,242.33 

19 

4,587.99 




For testing adjusted means, 

5 

3,226.92 


645.38** 


method, the two highest yielding varieties, D and are not significantly 
different, but they are significantly superior to all the others, which do not 
differ significantly among themselves. 

fn some cases, plant numbers might be influenced partly by fertility 
variations and partly by b^sic differences between varieties. The possi- 
bility of a partial adjustment has been considered by H. F. Smith (4). 

EXAMPLE 14.4.1 — Verify the adjusted means in the corn experiment and carry 
through the tests of all the differences. 

EXAMPLE 14.4.2 — Estimate the efficiency of the covariance adjustments. Ans. 5.55. 

EXAMPLE 14.4.3— As an alternative to covariance, could we analyze the yield per 
plant, Y/X, as a means of removing differences in plant numbers? Ans. This is satisfactory 
if the relation between Y and X is a straight line going through the origin. But b is often sub- 
stantially less than the mean yield per plant, because when plant numbers are high, competi- 
tion between plants reduces the yield per plant. If this happens, the use of Y/X overcorrects 
for stand. In the corn example b = 8.1 and the overall yield per plant is 4,794/632 = 7.6, 
in good agreement: Yield per plant would give results similar to covariance. Of course, 
yield per plant should be analyzed if there is direct interest in this quantity. 
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EXAMPLE 14.4,4— The following data are the yields ( Y) in bushels per acre and the 
per cents of stem canker infection (X) in a randomized blocks experiment compaiing four 
lines of soybeans (5). 


Blocks 

Lines 

Totals 

X Y 

/. 

X 

1 

Y 

B 

X Y 

C 

X Y 

D 

X Y 

1 

19.3 

21.3 

10.1 

28.3 

4.3 

26.7 

14.0 

25.1 

47.7 

101.4 

2 

29.2 

19.7 

34.7 

20.7 

48.2 

14.7 

30.2 

20.1 

142.3 

75.2 

a 

1.0 

28.7 

14.0 

26.0 

6.3 

29.0 

7.2 

24.9 

28.5 

108.6 

4 

6.4 

27.3 

. 5.6 

34., 1 

6.7 

29.0 

' 8.9 

29.8 

27,6 

120.2 

Totals 

55.9 

97.0 

64.4 

109.1 

65.5 

99.4 - 

60.3 

99.9 

246.1 

405.4 


By looking at some plots with unusually high and unusually lo$ X, note that there seems a 
definite negative relation between Y and X. Before removing this source of error by co- 
variance, check that the lines do not differ in the amounts of infection. The analysis of 
sums of squares and products is as follows : 



d.f. 

■ Zx 1 

Zxy 

sy 

Blocks 

3 

2,239.3 

— 748.0 

272.9 

Treatments 

3 

14.3 

10.2 

21.2 

Error 

9 

• 427.0 

-145.7 

66.0 

T+E 

12 

441.3 

-135.5 

87.2 


(i) Perform the F-test of the adjusted means. 

(ii) Find the adjusted means and test the differences among them. 

(iii) Estimate the efficiency of the adjustments. Ans. (i) F=4.79*: d.f. ~ 3, 8 ; (ii) 
A , 23.77; F, 27.52; C, 25.19; 2), 24.87. By the LSD test, B significantly exceeds A and D . 
(iii) 3.56. Strictly, a slight correction to this figure*should be made for the reduction in d.f. 
from 9 to 8. 


14.5 — Interpretation of adjusted means in covariance. The most 
straightforward use of covariance has been illustrated by the preceding 
examples. In these, the covariate X is\a 'measure of the responsiveness of 
the experimental unit, either directly (as with the leprosy bacilli) or in- 
directly (as with number of plants). The adjusted means are regarded as 
better estimates of the treatment effects than the unadjusted means be- 
cause one of the sources of experimental error has been removed by the 
adjustments. 

Interpretation of adjusted means is usually more difficult when both 
Y and X show differences between treatments, or between groups in an 
observational study. As mentioned in section 14.1, adjusted means are 
sometimes calculated in this situation either in order to throw light on the 
way in which the treatments produce their effects or to remove a source of 
bias in the comparison of Y between groups. The computations remain 
unchanged, except that the use of the effective Error mean square 
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s 


2 

yx 



is not recommended for finding an approximation to the variance of the 
difference between two adjusted means. Instead, use the correct formula : 


, 2 _ „ 2 
OJ) — by. x 


2 , (x r -x r y 


The reason is^that when the n’s differ from treatment to treatment, the 
term (X r ~ X r ) 2 can be large and can vary materially from one pair of 
means to another, so that s D 2 is no longer approximately constant. 

As regards interpretation, the following points should be kept in 
mind. If the X’s vary widely between treatments or groups, the adjust- 
ment involves at} element of extrapolation. To cite an extreme instance, 
suppose that one group of men have ages (X) in the forties, with mean 
about 45, while a second group are in their fifties with mean about 55. 
In the adjusted means, the two groups are being compared at mean age 
50, although neither group may have any men at this specific age. In using 
the adjustment, we are assuming that the linear relation between Y and X 
holds somewhat beyond the limits of each sample. In this situation the 
value of s D 2 becomes large, because the term (X.. — X r ) 2 is large. The 
formula is warning us that the adjustments have a high element of uncer- 
tainty. It follows that the comparison of adjusted means has low pre- 
cision. Finding that F- or /-tests of the adjusted means show no signifi- 
cance, we may reach the conclusion that “The differences in Y can be 
explained as a consequence of the differences in X,” when a sounder 
interpretation is that the adjusted differences are so imprecise that only 
very large effects could have been detected. A safeguard is to compute 
confidence limits for some of the adjusted differences: if the F-test alone 
is made, this point can easily be overlooked. 

Secondly, if X is subject to substantial errors of measurement, the 
adjustment removes only part of any difference between the Y means that 
is due to differences in the X means. Under the simplest mathematical 
model, the fraction removed may be shown to be a x /(a x 2 + cr d 2 ), where 
a d 2 is the variance of the errors of measurement of X. This point could 
arise in an example mentioned in section 14.1, m which covariance was 
suggested for examining whether differences produced by soil fumigants 
on spring oats ( Y) could be explained as a reflection of the effects of these 
treatments on the numbers of nematode cysts (X). The nematode cysts 
are counted by taking a number of small soil samples from each plot and 
sifting each sample carefully by some process. The estimate of X on each 
plot is therefore subject to a sampling error and perhaps also to an error 
caused by failure to detect some of the cysts. Because of these errors, some 
lifferences might remain among the adjusted Y means, leading to an 
erroneous inference that the differences in yield could not be fully ex- 
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plained by the effects of the treatments on the nematodes. Simi- 
larly, in observational studies the adjustment removes only a fraction 
Ox 2 l(Px 2 + G d) of a bias due to a linear relation between Y and X. In- 
cidentally, the errors of measurement d do not vitiate the use of covariance 
in increasing the precision of the Y comparisons in randomized experi- 
ments, provided that Y has a linear regression on the measurement 
X r — X + d. However, as might be expected, they make the adjustments 
less effective, because the correlation p r between Y and X' = X 4 - d is less 
than the correlation p between Y and X, so that the residual error variance 
oy 2 (l — p' 2 ) is larger. 

Finally, the meaning of the adjusted values is often hard to grasp, 
especially if the reasons for the relation between Y and X are not well 
known. As an illustration, table 14.5.1 shows the average 1964 expendi- 
tures Y per attending pupil for schools in the states in each of five regions 
of the U.S. (6). These are simple averages of the values for the individual 
states in the region. Also shown are corresponding averages of 1963 per 
capita incomes X in each region. In an analysis of variance into Between 
Regions and Between States Within Regions, the differences between 
regions are significant both for the expenditure figures and the per capita 
incomes. Further, the regions fall in the same order for expenditures as 
for incomes. 


TABLE 14.5.1 

1964 School Expenditures_Per Attending Pupil (F) and 1963 Per Capita 
Incomes (X) in Five Regions of the U.S. 



East 

Mountain 
and Pacific 

North 

Central 

South 

Atlantic 

South 

Central 

Number of states 

8 

11 

12 

9 

8 




(dollars) 



Expenditures 

542 

500 

479 

399 

335 

Per capita incomes 

2,600 

2,410 

2,370 

2,310 

1,780 


It seems natural to ask: Would the differences in expenditures dis- 
appear after allowing for the relation between expenditure and income? 
The within-region regression appears to be linear, and the values of b do 
not differ significantly from region to region. The average b is 0.140 
($14 in expenditure for each additional $100 of income). The adjusted 
means for expenditure, adjusted to the overall average income of $2,306, 
are as follows: 


(Dollars) 


E M P NC 
501 485 470 


SA. SC 

398 409 
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The differences between regions have now shrunk considerably, al- 
though still significant, and the regions remain in the same order except 
that the South Central region is no longer lowest. On reflection, however, 
these adjusted figures seem hypothetical rather than concrete. The figure 
of $409 for the South Central region cannot be considered an estimate of 
the amount that this region would spend per pupil if its per capita income 
were to increase rapidly, perhaps through greater industrialization, from 
$1,780 to $2,306. In fact, if we were trying to estimate this amount, a 
study of the Between Years regression of expenditure on income for in- 
dividual stated would be more relevant. Similarly, a conclusion that “the 
differences in expenditures cannot be ascribed to differences in per capita 
income” is likely to be misunderstood by a non-technical reader. For a 
good discussion of other complications in interpretation, see (4). 

14,6 — Comparison of regression lines. Frequently, the relation be- 
tween Y and X is studied in samples obtained by different investigators, 
or in different environments, or at different times. In summarizing these 
results, the question naturally arises : can the regression lines be regarded 
as the same? If not, in what respects do they differ? A numerical 
example provides an introduction to the handling of these questions. The 
example has only two samples, but the techniques extend naturally to 
more than two samples. 

In a survey to examine relationships between the nutrition and the 
health of women in the Middle West (7), the concentration of cholesterol 
in the blood serum was determined on 56 randomly selected subjects in 
Iowa and 130 in Nebraska. In table 14.6.1 are subsamples from the sur- 
vey data. Figure 14.6.1 shows graphs of the data from each state. The 
figure gives an impression of linearity of the regression of cholesterol 
concentration on age, which will be assumed in this discussion. * 

The purpose is to examine whether the linear regressions of choles- 
terol on age are the same in Iowa and Nebraska. They may differ in 
slope, in elevation, or in the residual variances a y . x l . The most con- 
venient approach is to compare the residual variances first, then the slopes, 
and lastly the elevations. In terms of the model, we have 

cc t + ^ l X lJ -f- g ij9 

where / = 1, 2 denotes the two states. We first compare the residual 
variances o 2 and <r 2 \ next jS 1 and /? 2 , and finally the elevations of the 
lines, a x and a 2 . 

The computations begin by recording separately the Within sum of 
squares and products for each state, as shown m table 14.6.2 on lines 1 and 
2. The next step is to find the residual S.S. from regression for each state, 
as on the right m lines 1 and 2. The Residual mean squares, 2,392 and 
1,581, are compared by the two-tailed F-test (section 2.9) or, with more 
than two samples, by Bartlett’s test (section 10.21). If heterogeneous 
variances were evident, this might be pertinent information in itself. In 
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TABLE 14 6 1 

Act AND Concentration of Cholesterol (mg / 100 ml ) in the Blood Serum of 8 
Iowa and Nebraska Women 


Iowa, n 

= 11 

Nebraska, n = 19 

Age # Cholesterol 

Age 

5 Cholesterol 

Age 

Cholesterol 

JT 

Y 

X 

Y 

X 

Y 

46 

181 

18 

137 

30 

140 

52 

228 

44 

173 

47 

196 

39 

182 

33 

177 

58 

262 

65 

249 

78 

241 

70 

261 

54 

259 

51 

225 

67 

356 

33 

201 

43 

223 

31 

159 

49 

121 

44 

190 

21 

191 

76 

339 

58 

257 

56 

197 

71 

224 

63 

337 



41 

112 

19 

189 



58 

189 

42 

214 



Sum 584 

2,285 



873 

4,125 

Xj = 53 1 

Yj = 207.7 



X„ = 45.9 

F* = 217.1 

Iowa 

IX 1 = 

32,834 


XAT= 127,235 

sr 2 

= 515,355 

C: 

31,005 


121,313 


474,657 

Xx 2 — 

1,829 


= 5,922 

zy 1 

= 40,698 

Nebraska 

IX 2 = 

45,677 


1XY = 203,559 

1Y 2 

= 957,785 

C: 

40,112 


189,533 


895,559 

Xx 2 = 

5,565 


Xxy = 14,026 

Zy 2 

= 62,226 

Total, n = 30 

XX- 1,457, X T 

= 48.6 IX 2 = 

78,511 XAT = 

330,794 

1Y 2 = 1,473,140 

1Y = 6,410, F r 

= 213.7 

C: 

70,762 

311,312 

1,369,603 



X* 2 = 

7,749 Xxy = 

19,482 

l} 2 = 103,537 


this example, F = 1 .5 1 , with 9 and 1 7 d.f. 9 giving a P value greater than 0.40 
in a two-tailed test. The mean squares show no sign of a real difference. 

Assuming homogeneity of residual variances, we now compare the 
two slopes or regression coefficients, 3.24 for Iowa and 2.52 for Nebraska. 
A look at the scatters of the points about the individual regression lines 
in figure 14.6.1 suggests that the differences in slope may be attributable 
to sampling variation. To make the test (table 14.6.2), add the d.f. and 
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0 20 30 40 50 60 70 80 


Ag« (years) 

Fig 14 6 1 — Graph of 1 1 pairs of Iowa data and 19 pairs from Nebraska Age is X 
and concentration of cholesterol, Y 

S.S. for the deviations from the individual regression, recording these sums 
m line 3. The mean square, 1,862, is the residual mean square obtained 
when separate regression lines are fitted m each state. Secondly, m line 
4 we add the sums of squares and products, obtaining the pooled slope, 
2.70, and the SS 49,107, representing deviations from a model in which 
a single pooled slope is fitted The difference, 49,107 — 48,399 = 708 
(line 5), with 1 d.f., measures the contribution of the difference between 
the two regression coefficients to the sum of squares of deviations. If 
there were k coefficients, this difference would have (k — 1) d.f. The cor- 
responding mean square is compared with the Within States mean square 
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TABLE 14 6 2 

Comparison of Regression Lines Cholesterol Data 



df. 

Zx 2 

Exy 

2/ 

Reg. 

Coef 

Deviations From Regression 
df S.$. MS. 

L 

Within 









i 

Iowa 

10 

1,829 

5,922 

40,698 

3 24 

9 

21,524 

2,392 

2 

Nebraska 

18 

5,565 

14,026 

62,226 

2.52 

17 

26,875 

1,581 

3 



i 

26 

48,399 

1,862 

4 

Pooled, W 

28 

7,394 

19,948 

102,924 

2.70 

27 

49,107 

1,819 

5 



Difference between slopes 

1 

708 

708 

6 

Between, B 

1 

355 

-466 

613 



7 

W + B 

29 

7,749 

19,482 

103,537 | 


28 

54,557 


u 



Between adjusted means 

1 

5,450 

5,450 


Comparison of slopes: F= 708/1,862 = 0 38 (d.f = 1,26) N.S. 
Comparison of elevations: F= 5,450/1,819 = 3 00 (df = 1,27 ) N S. 


1,862, by the F-test. In these data, F= 708/1,862 = 0.38, d.f. = 1, 26, 
supporting the assumption that the slopes do not differ. 

Algebraically, the difference 708 in the sum of squares may be shown 
to be £iE 2 (7>i — b 2 ) 2 /('L 1 + E 2 ), where E x , E 2 are the values of Ex 2 for 
the two states. With more than two states, the difference is l.wf t — B) 2 
where w,= 1/E, and h is the pooled slope, 1.w i b i l'Lw i . The sum of 
squares of deviations of the b’s is a weighted sum, because the variances of 
the b t , namely a y . x 2 /'L i , depend on the values of Ex 2 . 

If the sample regressions were found to differ significantly, this might 
end the investigation. Interpretation would involve the question : Why? 
The final question about the elevations of the population regression lines 
usually has little meaning unless the lines are parallel. 

Assuming parallel lines and homogeneous variance, ^ve write the 
model as 

Y,j = oc, + pX tJ + e,j i, 

where z = 1,2, denotes the state. It remains to test the null hypothesis 
oc x = a 2 - The least squares estimates of oz x and a 2 are ($ x = F x — bX t and 
<S 2 = y 2 - bX 2 . Hence, the test of this H 0 is identical to the test of the 
H 0 that the adjusted means of the Fs are the same in the two states. 
This is, of course, the F-test of the difference between adjusted means 
that was made in section 14.3. It is made m the usual way in line 4 to 8 
in table 14.6.2. Line 4 gives the Pooled Within States sums of squares and 
products, while line 6 shows the Between States sums of squares and 
products. In line 7 these are combined, just as we combined Error and 
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Treatments in section 14.3. A Deviations S.S., 54,557, is obtained from 
line 7 and the Deviations S.S. in line 4 is subtracted to give 5,450, the S.S. 
Between adjusted means. We find F = 3.00, d.f. I, 27, P about 0.10. In 
the original survey the difference was smaller than in these subsamples. 
The investigators felt justified in combining the two states for further 
examination of the relation between age and cholesterol. 

14.7— Comparison offline “Between Classes” and the “Within Classes” 
regressions. Continuing the theme of section 14.6, we sometimes need 
to compare the Between Classes regression and the Within Classes regres- 
sion in the same study. In physiology or biochemistry, for instance, Y 
and X are measurements made on patients or laboratory animals. Often, 
the number of subjects is limited, but several measurements of Y and X 
have been made on each subject. The Between Subjects regression may 
be the one of primary interest. The objective of the comparison is 
to see whether the Within and Between regressions appear to estimate 
the same quantities. If so, they can be combined to give a better estimate 
of the Between Subjects relationship. 

The simplest model that might apply is as follows: 

Y tJ = a + pX tJ + e tJ , (14.7.1) 

where i denotes the class (subject). In this model the same regression 
line holds throughout the data. The best combined estimates of a and fi 
are obtained by treating the data as a single sample, estimating a and /? 
from the Total line in the analysis of variance. 

Two consequences of this model are important: (1) The Between 
and Within lines furnish independent estimates of call these b x and b, 
respectively. (2) The residual mean squares s x 2 and s 2 from the regres- 
sions in the Between and Within lines are both unbiased estimates of cr 2 , 
the variance of the 

To test whether the same regression holds throughout, we therefore 
compare b x and b and s x 2 and 5 2 . Sometimes, b x and b agree well, but 
s x 2 is found to be much larger than s z . One explanation is that all the 
Y tJ for a subject are affected by an additional component of variation 
d t , independent of the s ir This model is written 

Y tJ = cc + f$X l3 + d t -f s tJ (14.7.2) 

If the subjects are a random sample from some population of subjects, 
the d x are usually regarded as a random variable from subject to subject 
with population mean zero and variance a B 2 . Under this model, b x and 
b are still unbiased estimates of /?, but with m pairs of observations per 
subject, s x 2 is an unbiased estimate of a 2 = (a 2 + ma B 2 ), while s 2 con- 
tinues to estimate <? 2 . Since the method of companng b and b x and the 
best way of combining them depend on whether the component d x is 
present, we suggest thats 2 and 2 be compared first by an F-test. 
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The calculations are illustrated by records from ten female leprosy 
patients. The data are scores representing the abundance of leprosy bacilli 
at four sites on the body, the X t] being initial scores and the Y tJ scores after 
48 weeks of a standard treatment. Thus m — 4, n — 10. (This example 
is purely for illustration. This regression would probably not be of interest 
in itself ; further, records from many additional patients were available so 
that a Between Patients regression could be satisfactorily estimated di- 
rectly.) Table 14.7.1 shows the initial computations. 


TABLE 14.7.1 

Scores for Leprosy Bacilli at Four Sites on Ten Patients 



d.f. 

Zx 2 




Reg. Coef. 

Between patients 

9 

28.00 


26.00 

38.23 ! 

ht =0 939 

Within patients 

30 

26.00 


13.00 

38.75 ! 

b = 0.500 

Total 

39 

54.00 


39.00 

76.98 



Reduction 


Deviations From Regression 


(Xxyfj'Lx 2 

d-f. 


5.5. 

M.S. 

Between patients 

24.14 

8 


14.09 

s\ - 1.761 

Within patients 

6,50 

29 


32.25 

s 2 =» 1.112 


After performing the usual analysis of sums of squares and products, 
the reduction in sum of squares due to regression is computed separately 
for the Between and Within lines (lower half of table 1 4.7. 1 ). From these, 
the Deviations S.S. and M.S. are obtained. The F ratio is s 2 fs 2 
= 1.761/1.1 12 = 1.58 with 8 and 29 d.f., corresponding to a P level of 
about 0.20. 

Although F falls short of significance, the investigator may decide to 
assume that of 2 is greater than a 2 , and thus to retain the model (14.7.2), 
particularly since the Between Patients mean square is significant for both 
Y and X individually. To compare b l and b under this model, note that 
the estimated variances of b x and b are sffZy and s 2 /E, where £ t and E are 
the values of "Lx 2 for Between Patients and Within Patients, respectively. 
From table 14.7.1 the ratio of (b t - b) to its standard error is therefore 


t' = 



0.939 - 0.500 

L76l Ul2 
+ 


28.00 26.00 


0.439 


0.439 

0.325 


= 1.35 

which is clearly non-significant. The quantity t' is not distributed as t, 
but its significance level, if needed, is found by the approximate method 
in section 4.14. Since s t 2 has 8 d.f. and s 2 has 29 d.f, find the 5°/ 0 sig- 
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nificance levels of t for 8 d.f. and 29 d.f., namely 2.306 and 2.045. Form a 
weighted mean of these two values, with weights s 1 2 fL 1 = 0.0629 and 
s 2 fL = 0.0428. This mean is 2.20, the required 5% significance level of t'. 

It remains to find a combined estimate of j3 from b x and b. In com- 
bining two independent estimates that are of unequal precision, a general 
rule is to weight each estimate inversely as its variance. In this example, 
as is usually the case in practice, we have only estimates s 1 2 /'L 1 = 0.0629 
and s 2 /X = 0.0428 of the variances of b x and b. If s x 2 and s 2 both have 
at least 8 d.f., weight b x and b inversely as their estimated variance (8). 
The weights are wq = 1/0.0629 = 15.9, w= 1/0.0428 = 23.4, giving 


i? = 


(15.9)(0.939) + (23.4)(0.500) 
39.3 


= 0.678 


If W = w, + w = 39.3, the standard errdr of $ may be taken as (8) 


1 / j 4w t w (//+/) ) 

vWl w 2 u j 


0.171, 


where f v f are the d.f in Sj 2 , s 2 . The second term above is an allowance 
due to Meier (9) for sampling errors in the weights. 

We now show how to complete the analysis if <j 2 = a 2 . Form a 
pooled estimate of cr 2 from s t 2 and s 2 . This is a 2 = 46.34/37 — 1.252 with 
37 d.f. The estimated variance of (b l — b ) is 


a 2 a 2 (s x + £) 


(1 ,252)(54.00) 
(28.00X26.00) 


0.0929 


Hence, (b l — h) is tested by the ordinary {-test, 

0.4386 0.4386 , _ 

r — , -- — — 1 .44 

^^0929 0.305 


(37 d.f.) 


The pooled estimate of /? is simply the estimate Zxv/Zx 2 from the Total 
line in the analysis of variance. This is 39.00/54.00 = 0.722, with standard 
error sJWl&i + I)} = V( 1 -252/54.00) = 0.152. 

Methods for extending this analysis to multiple regression are pre- 
sented m ( 1 0) 

14.8 — MtiMple covariance. With two or more independent variables 
there is no change in the theory beyond the addition of extra terms in X. 
1 he method is illustrated for a one-way classification by the average daily 
gams of pigs m table 14.8.1. Presumably these are predicted at least 
partly by the ages and weights at which the pigs were started m the experi- 
ment, which compared four feeds. 

This experiment is an example of a technique in experimental design 
known as balancing. The assignment of pigs to the four treatments was 
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not made by strict randomization. Instead, pigs were allotted so that 
the means of the four lots agreed closely in both X x and X 2 . An indication 
of the extent of the balancing can be seen by calculating the F-rafios for 
Treatments/Error from the analyses of variance of X x and X 2 , given 
under table 14.8.1. These F’s are 0.50 for X x and 0.47 for X 2% both well 
below L 

The idea is that if X x and X 2 are linearly related to Y , this balancing 
produces a more accurate comparison among the Y means. One com- 
plication is that since the variance within treatments is greater than that 
between treatments for X x and X 2 , the same happens to some extent for 
Y. Consequently, in the analysis of variance of Y the Error mean square 
is an overestimate and the F-test of Y gives too few significant results. 
However, if the covariance model holds, the analysis of covariance wjll 
give an unbiased estimate of error and a correct F-test for the adjusted 
means of Y. The situation is interesting in that, with balancing, the reason 
for using covariance is to obtain a proper estimate of error rather than to 
adjust the Y means. If perfect balancing were achieved, the adjusted Y 
means would be the same as the unadjusted means. 

The first step is to calculate the six sums of squares and products, 
shown under table 14.8.1. Next, b x and b 2 are estimated from tHe Error 
lines, the normal equations being 

4,548.20#! 4- 2,877.40# 2 = 5.6230 
2,877.40#! + 4,876.90# 2 = 26.2190 
The c tJ inverse multipliers are 

c u = 0.0003508, c 12 = -0.0002070, c 22 * 0.0003272 

These give 

#! = -0.0034542 # 2 - 0.0074142 

Reduction in 5.5. = (-0.0034542)(5,6230) + (0.00741 42)(26.2 190) 

= 0.1750 

Deviations S.S. =0.8452 -0.1750 = 0.6702 (34d.f.): s 2 = 0.0197 
The standard errors of b x and b 2 are 

s bl = t , t ) = 0.00263 : s bi = V(s 2 c 22 ) = 0.00254 

It follows that b 2 is definitely significant but b x is not. In practice, we 
might drop X x (age) at this stage and cc ntinue the analysis using tjie regres- 
sion of Y on X 2 alone. But for illustration we shall adjust for both 
variables. 

If an F-test of the adjusted means is wanted, make a new calculation 
of b x and b 2 from the Treatments plus Error lines, in this case the Total 
line. The results are 6,= -0.0032903, b 2 = 0.0074093, Deviations 
S.S. = 0.8415 (37 d.f.). The F-test is made in table 14.8.2. 

The adjusted Y means are computed as follows. In our notation. 
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TABLE 14 8 1 

' Initial Age (2^), Initial Weight (X 2 ), and Rate of Gain (Y) of 40 Pigs 
(Four treatments in lots of equal size) 



Treatment 1 

Treatment 2 


Initial 

Weight, 


Initial Weight, 



Age, X, 

x 2 

Gain, Y 

Age, X v X 2 

Gam, V 


(days) 

( pounds ) 

( pounds 

(days) (pounds ) 

( pounds 





per day) 


per day) 


78 

61 

1 40 

78 74 

1 61 


90 

59 

179 

99 75 

1.31 


94 

76 

1 72 

80 64 

1.12 


71 

50 

1 47 

75 48 

1.35 


99 

61 

1 26 

94 62 

1.29 


80 

54 

1 28 

91 42 

1.24 


83 

SI 

1.34 

75 52 

1.29 


75 

45 

1.55 

63 43 

1.43 


62 

41 

1.57 

62 50 

1.29 


67 

40 

1.26 

67 40 

1.26 

Sums 

799 

544 

14.64 

784 550 

13.19 

Means 

79 9 

54 4 

1.46 

78.4 55 0 

132 v 


Treatment 3 

Treatment 4 


78 

80 

1 67 

77 62 

1.40 


83 

61 

1 41 

71 55 

1.47 



79 

62 

1 73 

78 62 

1.37 



70 

47 

1.23 

70 43 

1 15 



85 

59 

1.49 

95 57 

122 



83 

42 

1.22 

96 51 

148 



71 

47 

1 39 

71 41 

1 31 



66 

42 

1.39 

63 40 

1 27 


67 

40 

1 56 

62 45 

1 22 


67 

40 

1 36 

67 39 

1 36 

Sums 

749 

520 

14 45 

750 495 

13 25 

Means 

74 9 

52 0 

1 44 

75 0 49 5 

1 32 


Sums of Squares and Products 



df 


M 

X 

Zx ,\ 2 


Treatments 

3 


187 70 

160 15 

189,08 

Error 


36 


4,548 20 

2,877 40 

4,876 90 

Total 


39 


4,735 90 

3,037 55 

5,065 98 



df 




X > 2 

Treatments 

3 


1 3005 

1 3218 

0 1776 

Error 


36 


5 6230 

26 2190 

0 8452 

Total 


39 


6.9235 

27 5408 

1.0228 
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TABLE 14 8.2 

Analysis of Covariance of Pig Gains. Deviations From Regression 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Total 

37 

0.8415 


Error 

34 

0.6702 

0.0197 

For testing adjusted 
Treatment means 

3 

0.1713 

0.0571* 


F = 0.0571/0.0197 =2.90*, df. = 3 34 


X 2l denote the means of Y, X x , and X 2 for the ith treatment while X l and 
X 2 denote the overall means of X t and X 2 . 


Treatment 

1 

2 

3 

4 

Multiplier 

7, 

1.46 

1.32 

1.44 

1.32 

1 

C^ii “ ^ 1 ) 

+2.9 

+ 1.4 

-2.1 

-2.0 

0 00345 = -i> t 

(X 2 ,-X 2 ) 

+ 1.7 

+ 2.3 

-0.7 

-3.2 

-0.00741 = -b 2 

Zdj 

1.46 

1.31 

1.44 

134 



Thus, for treatment 4, 

Yadj. — T 4 — b x (X X d — ^ 1 ) — b 2 (X 2A — X 2 ) 

= 1.32 + 0.00345(— 2.0) - 0.00741 (-3.2) = 1.34 

There is little change from unadjusted to adjusted means because of 
the balancing. 

The estimated variance of the difference between the adjusted means 
of the ith and Jth treatments is 

s 2 [2/n + CtiCATj, ~ Xij) 2 + 2c 12 (X u — X t j)(X 2l — X 2j ) + c 22 (X 2i — X 2J ) 2 ] 

As with covariance on a single T-variable (section 14.2), an average 
error variance can be used for comparisons among the adjusted means if 
there are at least 20 d.f. for Error. The effective Error mean square per 
observation is 

s' 2 = j 2 [1 + Ciifn + 2 c l2 t l2 + £ 22 ^ 22 ! 

where t lu t 22 and t l2 are the Treatments mean squares and mean product. 
This equation is the extension of (14.2.3) to two ^-variables. In these data 

s' 2 = 0.0197 [1 + {(0‘.3508)(62.6) - 2(0.2070)(53.4) + (0.3272)(63.0)}/10 3 ] 
= (0.0 197)( 1.020) = 0.0201. 

For instance, to find 95% confidence limits for the difference between the 
adjusted means' of treatments 1 and 2, we have 
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D = 1.46 - 1.31 = 0.15 pounds per day 
s D = V(2i' 2 /10) = V0-00402 = 0.0634 

3 

The difference 0. 1 5 pounds between Treatments 1 and 2 is the greatest 
of the six differences between pairs of treatments. It is the only difference 
that is significant by the LSD test. By the N ewman-Keuls test, none of the 
differences is significant, the required difference for 5% significance be- 
tween tl^e highest and th^ lowest means being 0. 1 7 pounds. This is one of 
those occasional examples in which although F is significant (just on the 
5% level), none of the individual differences between pairs is clearly 
significant. 

These data also illustrate the point that the regression of Y on X x 
alone may be quite different from the same regression, Y on X u when 
another X variable is included in the model — even the signs may be 
opposite. Consider the regression of Y on X t (age) in the pig data. Using 
Totals, the regression coefficient is 

b Yl = 6.9235/4,735.90 = 0.00146 Ib./day/day of age 

Compare this with b Yl . 2 = — 0.00329 calculated on p. 43f, also for Total. 
Why should average daily gain increase with age in the first case and 
decrease with age in the second? 


TABLE 14.8.3 

Data on 40 Pigs Classified by Initial Weight 


Initial 

Number ] 










Weight 

of Pigs 


Initial Age and Average Daily Gam 


Mean 

39-44 

13 

62 

63(2)* 

66 

67(5) 

70 

71 

83 

91 

69.5 

1.57 

1.35 

1.39 

1.36 

1.15 

1.31 

1.22 

1.24 

1 34 

45-49 

5 

62 

70 

71 

75(2) 





70 6 

1.22 

1.23 

1.39 

1.45 





1 35 

50-54 

* 5 

62 

71 

75 

80 

96 



! 

76 8 


1.29 

1.47 

1.29 

1.28 

1.48 



1 

1 36 

55-59 

5 

71 

83 

85 

90 

95 




84.8 


1.47 

1 34 

1.49 

l 79 

1.22 




1.46 

60-64 

8 

77 

78(2) 

79 

80 

83 

94 

99 


83.5 

1.40 

1.38 

1.73 

1.12 

1.41 

1.29_ 

1.26 


1.27 

74-80 

4 

78(2) 

94 

99 






87.2 


1.64 

1 72 

1 31 






1.58 

Total 

40 









77.05 











1.388 


* Number of pigs of this age 
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The first regression is an overall effect, ignoring initial weight. In 
this sample there was a slight tendency for the initially older pigs to gain 
faster. But among pigs of the same initial weight (initial weight held 
constant) the older pigs tended to gain more slowly. 

These facts may be observed in table 1 4.8.3. The right-hand column 
shows that both initial age and rate of gain increase with initial weight; 
they are positively associated because of their common association with 
initial weight. But within the rows of the table, where initial weight 
doesn’t change much, there is the opposite tendency. The older pigs tend 
to gain more slowly. Table 14.8.4 gives the within-weight regressions. 
In the last line is the Pooled regression, —0.00335. This average differs 
only slightly from the average, by l . 2 = -0.00329, estimating the same 
effect, the regression of average daily gain on initial age in a population of 
pigs all having the same initial weight. 

TABLE 14.8.4 

Analysis of Covariance in Weight Classes of Pigs 


Weight 

Class 

Degrees of 
Freedom 

Sums of Squares and Products 

Regression of 

Y onX l 

2 


v 

39-44 

12 

831.2308 

-6.1885 

0.1917 

-0.007445 

45-49 

4 

113.2000 ‘ 

2.0860 

0.0729 

0.018428 

50-54 

4 

634.8000, 

2.5720 

0.0427 

0.004052 

55-59 

4 

324.8000 

-0.6480 

0.1819 

-0.001995 

60-64 

7 

486.0000 

-3.6700 

0.2140 

-0.007551 

. 74-80 

3 

354.7500 

-3.3375 

0.1015 

-0.009408 

Pooled 

34 

2,744.7808 

-9.1860 

0.8047 

-0.003347 


14,9 — Multiple covariance in a 2-way table. As illustration we 
select data from an experiment (1 1, 12) carried out in Britain from 1932 
to 1937. The objective was to learn how well the wheat crop could be 
forecast from measurements on a sample of growing plants. During the 
growing season a uniform series of measurements were taken at a number 
of places throughout the country. The data in table 14.9.1 are for three 
seasons at each of six places and are the means of two standard varieties. 
In the early stages of the experiment it appeared that most of the available 
information was contained in two variables, shoot height at the time when 
ears emerge, X l9 and plant numbers at tillering, X 2 . 

For an initial examination of relationships, the data on Y, X x , and 
X 2 should be free of the place and season effects. Consequently, the re- 
gression is calculated from the Error or Places x Seasons Interactions line. 
If, however, the regression is to be successful for routine use in predicting 
yields, it should also predict the differences in yield between seasons. It 
might even predict the differences in yield between places, though this is 
too much to expect unless the ^-variables can somehow express the 
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effects of differences m soil types and soil fertilities between stations. 
Consequently, in data of this type, there is interest in comparing the 
Between Seasons and Between Places regressions with the Error regres- 
sion, though we shall not pursue this aspect of the analysis. 


TABLE 14.9.1 

Heights of Shoots at Ear Emergence (Y,), Number of Plants at Tillering (X 2 ), 
and Yield (Y) of Wheat in Great Britain 
(X t , inches; X 2 , number per foot; Y, cwt. per acre) 


Year 

Variate 

Place 

Year 

Sums 

Seale 

Hayne 

Rotham- 

sted 

New- 

port 

Bog- 

hall 

Sprows- 

ton 

Plump- 

ton 

1933 

X x 

25.6 

25.4 

30.8 

33.0 

28.5 

28.0 

171.3 


x 2 

14.9 

13.3 

4.6 

14.7 

12.8 

7.5 

67.8 


Y 

19.0 

22.2 

35.3 

32.8 

25.3 

35.8 

170.4 

1934 


25.4 

28.3 

35.3 

32.4 

25.9 

24.2 

171.5 


x 2 

7.2 

9.5 

6.8 

9.7 

9.2 

7.5 

49.9 


Y 

32.4 

32.2 

43.7 

35.7 

28.3 

35 2 

207.5 

1935 

x, 

27.9 

34.4 

32.5 

27.5 

23.7 

32.9 

178.9 


x 2 

18.6 

22.2 

10.0 

17.6 

14.4 

7.9 

90.7 


Y 

| 26.2 

34.7 

40.0 

29.6 

20.6 

47.2 

198.3 

Place 

Xi 

78.9 

88.1 

98.6 

92.9 

78.1 

85.1 

521.7 

Sums 

x 2 

40.7 

45.0 

21.4 

42.0 

36.4 

22.9 

, 208.4 


Y 

77.6 

89.1 

119.0 

98.1 

74.2 

118.2 

576.2 


d.f. Ex , 2 Ex,x 2 Zx 2 2 


Places 

5 

106.34 

- 47.06 

171.46 

Seasons 

2 

6.26 

26.24 

139.41 

Error 

10 

117.93 

20.17 

74.20 

Total 

17 

230.53 

- 0.65 

385.07 


d.f 

£x,r 

2*2 y 


Places 

5 

190.83 

-257.03 

629.22 

Seasons 

2 

8.41 

- 22.26 

124.42 

Error 

10 

142.01 

- 21.46 

228.66 

Total 

17 

341.25 

-300.75 

982.30 


The results obtained from the Error line are: ft, = 1.3148, 
b 2 = -0.6466, X j> 2 = 200.59, Id 2 = 28.07 (8 d.f.). These statistics, with 
some from the table, lead to the following information: 

1 . Freed from season and place effects, height of shoots and number 
of plants together account for 
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Zf/Xy 2 = 200.59/228.66 = 88% 

of the Error sum of squares for yield. 

2. The predictive values of the two independent variables are indi- 
cated by the following analysis of 2 y 2 : 


Source 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Regression on and X 2 

2 

200.59 



1 Regression on X x alone 

1 

171.01 


I 

[X 2 after X t 

1 

29.58 

29.58* 

i 

[Regression on X 2 alone 

1 

6.21 



\X l after X 2 

1 

194.38 

194.38** 

Deviations 

1 8 

28.07 

3.51 


While each X accounts for a significant reduction in Ey*, shoot 
height is the more effective. 

3. The Error regression equation is 

? = 1.393 + 1.3148 X x — 0.6466 X 2 

Substituting each pair of X , the values of ? and Y— t are calculated for 
each place in each season and entered in, table 14.9.2. 

TABLE 14.9.2 

Actual and Estimated Yields of Wheat 


Place 

1933 

i 

1934 

i 

1935 

; 

Sum 

Y 

? 

Y-? 

Y 

? 

Y- t 

Y 


Y-f 

Seale Hayne 

19.0 

25.4 

— 6.4 

32.4 

30.1 

2.3 

26.2 

26.0 

0,2 

-3.9 

Rothamsted 

22 2 

26.2 

-4.0 i 

32.2 

32.5 

-0.3 

34.7 

32.3 

2.4 ! 

-1.9 

Newport 

35.3 

38.9 

-3.6 ! 

43.7 

43.4 

0.3 

40.0 

37.7 

2.3 ! 

-1.0 

Boghall 

32.8 

35.3 

-2.5 

35.7 

37.7 

-2.0 

29.6 

26.2 

3.4 

-1.1 

Sprowston 

25.3 

30.6 

-5.3 

28.3 

29 5 

-1 2 

20.6 

23.2 

-2.6 

-9.1 

Plumpton 

35.8 

33.4 

2.4 

35.2 

28.4 

6 8 47 2 

39.5 

7.7 

16.9 

Sums 

-19.4 



59 

j 


13.4 

li!:L 


It seems clear from table 14.9.2 that the regression has not been suc- 
cessful in predicting the differences between seasons. There is a consistent 
underestimation in 1933, which averaged 19.4/6 = 3.2 cwt./acre, and an 
overestimation in 1935. If a test of significance of the difference between 
the adjusted seasonal yields is needed, the procedure is the same as for the 
F test of adjusted means in section 14.8. Add the sums of squares and 
products for Seasons and Error in table 14.9. 1 . Recalculate the regression 
from these figures, finding the deviations S.S., 120.01 with 10 d.f. The 
difference, 120.01 - 28.07 has 2 d.f., giving a mean square 45.97 for the 
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differences between adjusted seasonal yields. The value of F is 45.97/3.51 
= 13.1** with 2 and 8 df. 
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★ CHAPTER FIFTEEN 


c 

V__Jorvilinear regression 


15 J — Introduction. Although linear regression is adequate for many 
i eeds, some variables are not connected by so simple a relation. The dis- 
covery of a precise description of the relation between two or more quan- 
tities is one of the problems of curve fitting , known as curvilinear regression . 
From this general view the fitting of the straight line is a special case, the 
simplest and indeed the most useful. 

The motives for fitting curves to non-linear data are various. Some- 
times a good estimate of the dependent variable is wanted for any par- 
ticular value of the independent. This may involve the smoothing of 
irregular data and the interpolation of estimated F$ for values of X not 
contained in the observed series. Sometimes the objective is to test a law 
relating the variables, such as a growth curve that has been proposed from 
previous research or from mathematical analysis of the mechanism by 
which the variables are connected. At other times the form of the rela- 
tionship is of little interest ; the end in view is merely the elimination of 
inaccuracies which non-linearity of regression may introduce into a cor- 
relation coefficient or an experimental error. 

Figure 15.1.1 shows four common non-Jinear relations. Fart (a) is 
the compound interest law or exponential growth curve W = A{B X ), where 
we have written W m place of our usual Y. If B = 1 + /, where / is the 
annual rate of interest, W gives the amount to which a sum of money A 
will rise if left at compound interest for X years. As we shall see, this 
curve also represents the way in which some organisms grow at certain 
stages. The curve shown in Part (a) has A = 1 . 

If B is less than 1, this curve assumes the form shown in (b). It is 
often called an exponential decay curve , the value of W declining to zero 
from its initial value A as X increases. The decay of emissions from a 
radioactive element follows this curve. 

The curve in (c) is W = A - Bp x , with 0 < p < L This curve rises 
from the value (A - B) when X = 0, and steadily approaches a maximum 
value A , called the asymptote, as X becomes large. The curve goes by 
various names. In agriculture it has been known as Mitscherlidfs law. 
from a German chemist (11) who used it to represent the relation between 
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(a) Exponential Growth Law (b) Exponential Decay Law 
W * A(B X ) = A(e cx ) W = A(B -X ) = A(e“^) 



I 3 5 x I 3 5 X 

(c) Asymptotic Regression (d) Logistic Growth Law 


W = A — B ( p x ) = A~B(e~ cx ) W = A/O + Byo* ) 

Fig, 15.1.1 — Four common non-linear curves. 

the yield W of a crop (grown in pots) and the amount of fertilizer X added 
to the soil in the pots. In chemistry it is sometimes called the first-order 
reaction curve. The name asymptotic regression is also used. 

Curve (d), the logistic growth law , has played a prominent part in the 
study of human populations. This curve gives a remarkably good fit to 
the growth of the U.S. population, as measured in the decennial censuses, 
from 1790 to 1940. 

In this chapter we shall illustrate the fitting of three types of curve: 
(1) certain non-linear curves, like those in (a) and (b), figure 15.1.1, which 
can be reduced to straight lines by a transformation of the W or the X 
scale • (2) the polynomial in X, which often serves as a good approxima- 
tion; (3) non-linear curves, like (c) and (d), figure 15.1.1, requiring more 
complex methods of fitting. 

EXAMPLE 15 3 1 — The fit of the logistic curve of the U.S. Census populations (ex- 
cluding Hawaii and Alaska) for the 150-year period from 1790 to 1940 is an interesting 
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example, both of the striking accuracy of the fit, and of its equally striking failure when 
extrapolated to give population forecasts for 1950 and I960. The curve, fitted by Pearl and 
Reed (1), is 


1 K66.69)(10-° 1398 *) 

where X = 1 in 1790, and one unit in X represents 10 years, so that X = 16 in 1940, The 
table below shows the actual census population, the estimated population from the logistic, 
and the error of estimation. 


Year 

Population 
Actual Estimated 

A-E 

Year 

Population 

Actual Estimated 

A-E 

1790 

3.9 

3.7 

+0.2 

1880 

50.2 

50.2 

0.0 

1800 

5.3 

5.1 

+0.2 

1890 

62.9 

62.8 

+ 0.1 

1810 

7.2 

7.0 

+ 0.2 

1900 

76.0 

76.7 

-0,7 

1820 

9.6 

9,5 

+0.1 

1910 

92.0 

91.4 

+ 0.6 

1830 

12.9 

12.8 

+ 0.3 

1920 

105.7 

106.1 

-0,4 

1840 

17.1 

17.3 

-0.2 

1930 

122.8 

120.1 

+ 2.7 

1850 

23.2 

23.0 

+0.2 

1940' 

131.4 

132.8 

-1.4 

1860 

31.4 

30.3 

+ 1.1 

1950 

150.7 

143.8 

+ 6.9 

1870 

38.6 

39.3 

-0.7 

1960 

178.5 

153.0 

+ 25,5 


Note how poor the 1950 and I960 forecasts are. The forecast from the curve is that the 
U.S. population will never exceed 184 million; the actual 1966 population is already well 
over 190 million. The postwar baby boom and improved health services are two of the 
responsible factors. 


15,2 — The exponential growth curve. A characteristic of some of the 
simpler growth phenomena is that the increase at any moment is propor- 
tional to the size already attained. During one phase in the growth of a 
culture of bacteria, the numbers of organisms follow such a law. The 
relation is nicely illustrated by the dry weights of chick embryos at ages 
6 to 16 days (2) recorded in table 15.2.1. The graph of the weights in 
figure 15.2.1 ascends with greater rapidity as age increases, the regression 
equation being of the form 

w = (Axm 

where A and B are constants to be estimated. Applying logarithms to the 
equation, 

log W — log A + (log B)X 
or Y = a + fiX, 

where Y = log W, a = log A, and p = log B. This means that if log W 
instead of W is plotted against X , the graph will be linear. By the device 
of using the logarithm instead of the quantity itself, the data are said to 
be rectified. 

The values of Y = log W are set out in the last column of the table 
and are plotted opposite X in the figure. The regression equation, com- 
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TABLE 15.2.1 

Dry Weights of Chick Embryos From Ages 6 to 16 Days, 
Together With Common Logarithms 


Ages m Days 

X 

Dry Weight, W 
(grams) 

Common Logarithm 
of Weight 

Y 

6 

0.029 

-1.538* 

7 

0 052 

-1.284 

8 

0.079 

-1.102 

9 

0.125 

-0.903 

10 

0.181 

-0.742 

11 

0.261 

-0.583 

12 

I 0.425 

-0.372 

13 

0.738 

-0.132 

14 

1.130 

0.053 

15 

1.882 

0.275 

16 

2.812 

0.449 


* From the table of logarithms, -one reads log 0.029 = log 2.9 - log 100 = 0.462 
- 2 = - 1 538. 


puled in the familiar manner from the columns X and Y in the table, is 
• Y = 0.1 959!*- 2.689 

The regression line fits the data points with unusual fidelity, the correla- 
tion between Y and X being 0.9992. The conclusion is that the chick 
embryos, as measured by dry weight, are growing in accord with the 
exponential law, the logarithm of the dry weight increasing at the esti- 
mated uniform rate of 0.1959 per day. 

Often, the objective is to learn whether the data follow the exponential 
law. The graph of log W against X helps in making an initial judgment on 
this question, and may be sufficient to settle the point. If so, the use of 
semi-logarithmic graph paper avoids the necessity for looking up the 
logarithms of W . The horizontal rulings on this graph paper are drawn 
to such a scale that the plotting of the original data results in a straight 
line if the data follow the exponential growth law. Semidog paper can 
be purchased at most stationery shops. If you require a more thorough 
method of testing whether the relation between log W and X is linear, see 
the end of section 15.3. 

For those who know some calculus, the law that the rate of increase 
at any stage is proportional to the size already attained is described mathe- 
matically by the equation 


dW 

dX 


= cW , 


w here c is the constant relative rate of increase. This equation leads to the 
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Fig. 15.2.1 — Dry weights of chick embryos at ages 6-16 days with fitted curves 
Uniform scale : W = 0.002046(1 .57/ 

Logarithmic scale: Y = 0.1959 A" — 2.689 


relation 

log, W — log, A + cX, 


W = Ae cX , 

where e = 2.718 is the base of the natural system of logarithms. Relation 
15.2.1 is exactly the same as our previous relation 

log io W = « + PX 

except that it is expressed in logs to base e instead of to base 10. 

Since log, W= (log 10 W0(log, 10) = 23026 log 10 W, it follows that 
c = 2.3026)8. For the chick embryos, the relative rate of growth is 
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(2.3026)(0.1959) = 0.451 gm. per day per gm. It is clear that the relative 
rate of growth can be computed from either common or natural logs. 

To convert the equation log W ~ QA959X — 2.689 into the original 
form, we have 

W = (G.00205)(1.57) x 

where 0.00205 = antilog( — 2.689) — antilog(0.311 — 3) = 2.05/1,000 
= 0.00205. Similarly, 1.57 = antilog (0.1959). In the exponential form. 

W = (0.00205)<?°- 451 * 

the exponent 0.451 being the relative rate. 

Other relations that may be fitted by a simple transformation of the 
W or the X variable are W = l/X, W ~ a + log X, and log W — a 
+ fi log X. The applicability of the proposed law should first be examined 
graphically. Should the data appear to lie on a straight line in the relevant 
transformed scale, proceed with the regression computation. For the 
last of the above relations, logarithmic paper is available, both vertical 
and horizontal .rulings being in the logarithmic scale. 

The transformation of a non-linear relation so that it becomes a 
straight line is a simple method of fitting, but it involves some assump- 
tions that should be noted. For the exponential growth curve, we are 
assuming that the population relation is of the form 

Y = log W = a + PX + e, (15.2.2) 

where the residuals a are independent, and have zero means and constant 
variance. Further, if we apply the usual tests of significance to a and /?. 
this involves the assumption that the g’s are normally distributed. Some- 
times it seems more realistic, from our knowledge of the nature of the 
process or of the measurements, to assume that residuals are normal and 
have constant variance in the original W scale. This means that we 
postulate a population relation 

W — (A)(B X ) 4- d (15.2.3) 

where A, B now stand for population parameters, and the residuals d 
are J r ( 0, o 2 ). 

If equation 15.2.3 holds, it may be shown that in equation 15.2.2 the 
will not be normal, and their variances will change as X changes. 
Given model 15.2.3, the efficient method of fitting is to estimate A and B 
by minimizing 

X( W — AB X ) 2 

taken over the sample values. This produces non-linear equations in A 
and B that must be solved by successive approximations. A general 
method of fitting such equations is given in section 15.7. 
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EXAMPLE 15.2.1 — J. W. Gowen and W. C. Price counted the number of lesions of 
Aucuba mosaic virus developing after exposure to X-rays for various times (data made 
available through courtesy of the investigators). 


Minutes exposure 

0 

3 

7.5 

15 

30 

45 

60 

Count in hundreds 

271 

226 

209 

108 

59 

29 

12 


Plot the count as ordinate, then plot its logarithm. Derive the regression, Y « 2.432 
— 0.02227 X , where Y is the logarithm of the count and X is minutes exposure. 


EXAMPLE 15.2.2 — Repeat the fitting of the last example using natural logarithms. 
Verify the fact that the rate of decrease of hundreds of lesions per minute per hundred is 
(2. 3026) (0.02227) = 0.05128. 

EXAMPLE 15.2.3 — If the meaning of relative rate isn’t quite clear, try this approximate 
method of computing it. The increase in weight of the chick embryo during the thirteenth 
day is 1.130 - 0.738 = 0.392 gram; that is, the average rate during this period is 0.392 gm. 
per day. But the average weight during the same period is (1.130 + 0.738)/2 = 0.934 gm. 
The relative rate, or rate of increase of each gram, is therefore 0.392/0.934 = 0.42 gm. per 
day per gm. This differs from the average obtained in the whole period from 6 to 16 days, 
0.451, partly because the average weight as well as the increase in weight in the thirteenth 
day suffered some sampling variation, and partly because the correct relative rate is based 
on weight and increase in weight at any instant of time, not on day averages, 

15.3— The second degree polynomial. Faced by non-linear regression, 
one often has no knowledge of a theoretical equation to use. In many 
instances the second degree polynomial, 

f = a + bX + c* 2 , 

will be found to fit the data satisfactorily. The graph is a parabola whose 
axis is vertical, but usually only small segments of such a parabola appear 
in the process of fitting. Instead of rectifying the data a third variate is 
added, the square of X, This introduces the methods of multiple regres- 
sion. The calculations proceed exactly as in chapter 13 , X and X 2 being 
the two independent variates. It need only be remarked that JX 9 log X 9 
or 1 / X might have been added instead of X 2 if the data had required it. 

To illustrate the method and some of its applications, we present the 
data on wheat yield and protein content (3) in table 15.3.1 and figure 
15.3.1. The investigator wished to estimate the protein content for various 
yields. We shall also test the significance of the departure from linearity. 

The second column of the table contains the squares of the yields in 
column 1. The squares are treated in all respects like a third variable in 
multiple regression. The regression equation, calculated as usual, 

f = 17.703 - 0 . 3415 * + G.004075* 2 , 

is plotted in the figure. At small values of yield the second degree term 
with its small coefficient is scarcely noticeable, the graph falling away 
almost like a straight line. Toward the right, however, the term m X has 
bent the curve to practically a horizontal direction. 



TABLE 15.3 1 

Percentage Protein Content (Y) and Yield (X) of Wheat 
From 91 Plots* 


Yield, 

Bushel 

Per Acre 

X 

Square 

X 2 

Percentage 

Protein 

Y 

Yield, 

Bushel 

Per Acre 

X 

Square 

X 2 

Percentage 

Protein 

Y 

43 

1,849 

10.7 

19 

361 

13.9 

42 

1,764 

10.8 

19 

361 

13.2 

39 

1,521 

10.8 

19 

361 

13.8 

39 

1,521 

10.2 

18 

324 

10.6 

38 

1,444 

10.3 

18 

324 

13:0 

38- 

1,444 

9.8 

18 

324 

13.4 

37 

1,369 

10.1 

18 

324 

13.7 

37 

1,369 

10.4 

18 

324 

13.0 

36 

1,296 

10.3 

17 

289 

13.4 

36 

1,296 

11.0 

17 

289 

13.5 

36 

1,296 

12.2 

17 

289 

10.8 

35 

1,225 

10.9 

17 

289 

12,5 

35 

1,225 

12.1 

17 

289 

12.7 

34 

1,156 

10.4 

17 

289 

13.0 

34 

1,156 

10.8 

17 

289 

13.8 

34 

1,156 

10.9 

16 

256 

14.3 

34 

1,156 

12.6 

16 

256 

13.6 

33 

1,089 

10 2 

16 

256 

12.3 

32 

1,024 

11 8 

16 

256 

130 

32 

1,024 

10.3 

16 

256 

13.7 

32 

1,024 

10.4 

15 

225 

13.3 

31 

961 

12.3 

15 

225 

12.9 

31 

961 

96 

14 

196 

14.2 

31 

961 

11.9 

14 

196 

13.2 

31 

961 

11.4 

12 

144 

15 5 

30 

900 

9.8 

12 

144 

13 1 

30 

900 

10 7 

12 

144 

16.3 

29 

841 

10 3 

11 

121 

13 7 

28 

784 

98 

11 

121 

18 3 

27 

729 

13 1 

11 

121 

147 

26 

676 

11 0 

11 

121 

13 8 

26 

676 

11 0 

11 

121 

14 8 

25 

625 

12 8 

10 

100 

15 6 

25 

625 

11 8 

1 

100 

14 6 

24 

576 

99 

9 

81 

140 

24 

576 

11 6 

9 

81 

162 

24 

576 

11 8 

9 

81 

15 8 

24 

576 

12 3 

8 

64 

'S 5 

22 

484 

11 3 

8 

64 

142 

22 

484 

104 

8 

64 

n 5 

22 

484 

12 6 

| 7 

49 

n s 

21 

441 

13 0 

7 

49 

14 " 

21 

441 

14 7 

6 

56 

!' 2 

21 

441 

11 s 

5 

2s 

< j 

21 

441 

11 0 




20 

400 

12 8 




20 

400 

13a 





* Read iiom published graph This accounts lor the slight Ji^up mo between the 
^relation we got and that reported by the author 
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Fig. 15.3.1 — Regression of protein content on yield in wheat, 91 plots. 
Y = 1 7.703 - 0.341 SX + 0.004075X 2 


The analysis of variance and test of significance are shown in table 
15.3.2. The fitted regression on both X and X 2 gives a sum of squares of 
deviations, 97.53, with 88 d.f \ The sum of squares of deviations from a 
linear regression, Ey 2 — (Exy) 2 /Ex 2 , is 110.48, with 89 d.f. The reduc- 
tion in sum of squares, tested against the mean square remaining after 
curvilinear regression, proves to be significant. The hypothesis of linear 
regression is abandoned; there is a significant curvilinearity in the regres- 
sion. 

In table 15.4.1, many of the values of X (e.g., X = 39) have two or 
more values of Y. With such data, the sum of squares of deviations from 
the curved regression (88 d.f.) can be divided into two parts so as to provide 
a more critical test of the fit of the quadratic. The technique is described 
in the following section. In the present example this technique supports 
the quadratic fit. 

TABLE 15 3.2 

Test of Significance of Departure From Linear Regression 



Degrees of 

Sum of 

Mean 

Source of Variation 

Freedom 

Squares 

Square 

Deviations from linear regression 

89 

110 48 


Deviations from curved regression 

88 

97 53 

1.11 

Reduction in sum of squares 

1 

12.95 

12 95** 


F= 12.95/1. 11 = 11.7 
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The regression equation is useful also for estimating and interpo- 
lating. Confidence statements and tests of hypotheses are made as in 
chapter 13. 

As always in regression, either linear or curved, one should be wary of 
extrapolation. The data may be incompetent to furnish evidence of trend 
beyond their own range. Looking at figure 1 5.2.1, one might be tempted 
by the excellent fit to assume the same growth rate before the sixth day 
and after the sixteenth. The fact is, however, that there were rather sharp 
breaks in the rate of growth at both these days. To be useful, extrapola- 
tion requires extensive knowledge and keen thinking. 

EXAMPLE 15.3.1— The test of significance of departure from linear regression in 
table 15.3.2 may also be used to examine whether a rectifying transformation, of the type 
illustrated m section 1 5.2, has produced a straight line relationship. Apply this test to the 
chick embryo data in table 15.2.1 by fitting a parabola in X to log weights Y. Verify that 
the parabola is 

Y » -2.783162 + 0.214503* - 0.000846* 2 , 


and that the test works out as follows : 



Degrees of 
Freedom 

Sum of 
Squares 

Mean 

Square 

Deviations from linear regression 

9 

0.007094 


Deviations from quadratic regression 

8 

0.006480 

0.000810 

Curvilineanty of regression 

1 

0.000614 

0.000614 


F — 0.76, with 1 and 8 d.f. When the *’s are equally spaced, as in this example, a quicker 
way of computing the test is given in section 15.6. 


15.4 — Data having several Y’s at each X value. If several values of Y 
have been measured at each value X, the adequacy of a fitted polynomial 
can be tested more thoroughly. Suppose that for each X, a group of n 
values of Y are available. To illustrate for a linear model, if Y tJ denotes 
the yth member of the ith group, the linear model is 

Y tJ = a + 0X, + £ ( j, (15.4.1) 

where the e t} follow ^(O, a 2 ). It follows that the group means, Y t . are re- 
lated to the X, by the linear relation 

L = « + fiX, + £,. 

(1) By fitting a quadratic regression of the Y t . on X t , the test for 
curvature in table 15.3.2 can be applied as before. Since it is important 
in what follows, note that the residuals £,. have variance a 2 /n, since each 
£,. is the mean of n independent residuals from relation 15.4.1. 

(2) _The new feature is that the deviations of the Y l} from their group 
means Y, . supply an independent estimate of a 2 . The pooled estimate is 

s 2 = X t (L - Y) 2 /k(n - 1) 

■ = i j=i 
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with k(rt — 1) d.f. If we multiply the mean squares in analysis (1) by n, 
in order to make parts (1) and (2) comparable, we have the analysis of 
variance in table 15.4.1. 


TABLE 15.4.1 

Analysis of Variance for Tests of Linear Regression 


Source of Variation 

Degrees of Freedom Mean Square 

Linear regression of Y t on X t 

1 s t 2 

Quadratic regression of Y { on X, 

i h 2 

Deviations of from quadratic 1 

k - 3 sf 

Pooled within groups 

kin - 1) s 2 

Total 

kn- 1 


The following results are basic to the interpretation of this table. 
If the population regression is linear, the mean square s 2 2 is an unbiased 
estimate of <x 2 ; if the population regression is curved, s 2 tends to become 
large. If the population regression is either linear or quadratic, s d 2 is an 
unbiased estimate of cr 2 . When will s d 2 tend to become much larger than 
a 2 ? Either if the population regression is non-linear but is not adequately 
represented by a quadratic; for instance, it might be a third degree curve, 
or one with a periodic feature: or if there are sources of variation that 
are constant within any group but vary from group to group. This could 
happen if the measurements in different groups were taken at different 
times or from different hospitals or bushes. The pooled within-group 
variance s 2 is an unbiased estimate of o 2 no matter what the shape of the 
relation between X- an d X,. 

Consequently, first compute the F-ratio, s d 2 /s 2 , with (k - 3) and 
k(n - 1) d.f.. If this is significant, look at the plot of Y against X to see 
whether a higher degree polynomial or a different type of mathematical 
relationship is indicated. Examination of the deviations of the Y t . from 
the fitted quadratic for signs of a systematic trend is also helpful. If no 
systematic trend is found, the most likely explanation is that some extra 
between-group source of variation has entered the data. 

If s d /s 2 is clearly non-significant, form the pooled mean square of 
s d 2 and s 2 . Call this s p 2 , with (kn — 3)d.f. Then test F — s 2 2 /s p , with 1 
and (kn - 3) d.f., as a test of curvature of the relation. 

The procedure is illustrated by the data in table 15.4.2, made avail- 
able through the courtesy of B. J. Vos and W. T. Dawson. The point at 
issue is whether there is a linear relation between the lethal dose of 
ouabain, injected into cats, and the rate of injection. Four rates were 
used, each double the preceding. 

First, the total sum of squares of the lethal doses 21,744 is analyzed 
into “between rates,” 16,093, and “within rate groups, 5,651. Note 
that the number of cats n, differed slightly from group to group. 
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TABLE 15 4 2 

Lethal Dosf (Minus :>0 Units) of U S Standard Ouabain by Slow 
Intravenous Injection in Cat Until the Heart Stops 



X t = Rate of Injection 

m (mg /kg /mm )/l 045 75 

Total 


1 

2 

4 

8 


5 

3 

34 

51 


9 

6 

34 

56 


11 

22 

38 

62 


13 

27 

40 

63 


14 

27 

46 

70 


16 

28 

58 

73 


17 

28 

60 

76 


20 

37 

60 

89 


22 

40 

65 

92 


28 

42 




31 

50 




31 




£ Yu = y, 

217 

310 

435 

632 1 594 

«. 

12 

11 

9 

9 41 

r, 

18 1 

28 2 

48 3 

70 2 


4 727 10 788 

22 261 

45 940 83 716 


The inequality m the n t must be taken into account m setting up the 
equations for the regression of Y t on X t and X 2 Compute 

Xn,X t = 12(1) 4- 11(2) + 9(4) + 9(8) = 142 
Xn t X t 2 = 12(1) + 11(4) + 9(16) 4- 9(64) = 776 
2n t X t 3 = 12(1) + 11(8) 4- 9(64) 4- 9(512) = 5 284 

and similarly Xn t X t 4 = 39,356 We need also 

Zn l X l Y l = XXX = 1(217) 4- 2(310) 4- 4(435) -r 8(632) - 7,633 
and XX 2 Y, = 48,865 

Each quantity is then corrected for the mean m the usual way For 
example, 

Xn,(X 2 - X 2 ) 2 = Zn t X 4 - = 39,356 - (776) 2 /41 

= 24,668 8 

Zn.iX, - X)(Y t -Y ) = ZX l Y l - (En^.XX ^ )/Zn, 

= 7 633 - (142)(1,594)/41 = 2,112 3 

To complete the quantities needed for the normal equations, you may 
verify that 

- X)^ = 284 2, Z«,(Y, - X)(X 2 -X I ) = 2,596 4, 

T.n,(X, 2 - F)(F, - F ) = 18,695 6 
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The normal equations for b x and b 2 are 

284 2b i + 2,596 4b 2 = 2,112 3 
2,596 4b ! + 24,668 8 b 2 = 18,6956 

In the usual way, the reduction m sum of squares of Y due to the regression 
on b x and b 2 is found to be 1 6,082, while for the linear regression, the reduc- 
tion is 1 5,700 The final analysis of variance appears m table 15 4 3 


TABLE 15 4 3 

Tests of Deviations From Linear and Quadratic Regression 


Source of Variation 

Degrees of 
Freedom 

Sum of 
Squares 

Mean 

Square 

Linear regression on X 

! 

1 

15,700 

15,700 

Quadratic regression on X 

1 

382 

382 

Deviations from quadratic 

1 

11 

11 

Pooled within groups 

37 

5 651 

153 

Total 

40 

21 744 



The mean square 1 1 for the deviations from the quadratic is much 
lower than the Withm-groups mean square, though not unusually so for 
only 1 d f The pooled average of these two mean squares is 149, with 38 
df For the test of curvature, F = 382/149 = 2 56, with 1 and 38 d /, 
lying between the 25% and the 1 0% level We conclude that the results are 
consistent with a linear relation in the population 

EXAMPLE 15 4 1 — The following data selected from Swanson and Smith (4) to pro- 
vide an example with equal n show the total nitrogen content Y (grams per 100 cc of 
plasma) of rat blood plasma at nine ages X (days) 


Age of 


Rat 

25 

37 

50 

60 

80 

100 

130 

180 

360 


0 83 

0 98 

1 07 

1 09 

0 97 

1 14 

1 22 

1 20 

1 16 


0 77 

0 84 

1 01 

1 03 

1 08 

1 04 

1 07 

1 19 

1 29 


0 88 

0 99 

1 06 

1 06 

1 16 

1 00 

1 09 

1 33 

1 25 


0 94 

0 87 

0 96 

1 08 

1 11 

1 08 

1 15 

1 21 

1 43 


0 89 

0 90 

0 88 

0 94 

1 03 

0 89 

1 14 

1 20 

1 20 


0 83 

0 82 

1 01 

1 01 

1 17 

1 03 

l 19 

1 07 

1 06 

Total 

5 14 

540 

5 99 

6 21 

6 52 

6 18 

6 86 

7 20 

7 39 


A plot of the Y totals against X shows that (i) the Y values for V - 100 are abnormally 
low and require special investigation, (n) the relation is clearly curved Omit the data for 
X = 100 and test the deviations from a parabolic regression against the Within groups 
mean square Ans F — 14 
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15-5 — Test of departure from linear regression in covariance analysis. 
As in any other correlation and regression work, it is necessary in co- 
variance to be assured that the regression is linear. It will be recalled that 
in the standard types of layout, one-way classifications, two-way classifi- 
cations (randomized blocks) and Latin squares, the regression of Y on X 
is computed from the Residual or Error line in the analysis of variance. A 
graphical method of checking on linearity, which is often sufficient, is L o 
plot the residuals of Y from the analysis of variance model against the cor- 
responding residuals of X , looking for signs of curvature. 

The numerical method of checking is to add a term in X 2 to the 
model. Writing X x = X, X 2 = A' 2 , work out the residual or error sums 
of squares of Y, X u and X 2 , and the error sums of products of X x X 2 , YX x , 
and YX 2 , as was illustrated in section 14.8 for a one-way classification. 
From these data, compute the test of significance of departure from linear 
regression as in table 15.3.2. 

If the regression is found to be curved, the treatment means are 
adjusted for the parabolic regression. The calculations follow the method 
given in section 14.8. 

15.6 — Orthogonal polynomials. If the values of X are equally spaced, 
the fitting of the polynomial 

Y = b Q 4 - b[x + b 2 X 2 + b 3 X 3 + . . . 

is speeded up by the use of tables of orthogonal polynomials. The es- 
sential step is to replace X*(i = 1, 2 , 3 . . . ) by a polynomial of degree 
i in X , which we will call X t . The coefficients in these polynomials are 
chosen so that 

YX t = 0 : YX x X 3 = 0 (/ *j) 

where the sums are over the n values of X in the sample. The different 
polynomials are orthogonal to one another. Explicit formulas for these 
polynomials are given later in this section. 

Instead of calculating the polynomial regression of Y on X m the 
form above, we calculate it in the form: 

Y = B 0 + B x X x + B 2 X 2 + B 3 X, + . . . 

which may be shown to gne the same fitted polynomial. On account of 
the orthogonality of the X v we ha\e the results: 

B 0 =Y . 5 f = YXJ/YX 2 (i= 1,2,3 ) 

The values of the X l and of YX 2 are provided m the tables, making the 
computation of B t simple. Further, the reductions m X( Y — Y) 2 due 
to the successive terms in the polynomial are given b\ * 

(YX { Y) 2 /(YX x 2 ); (XX 2 Y) 2 /a V 2 2 ), (IA , > ) 2 /(IA, 2 ). and so on. 

Thus it is easy to check whether the addition of a higher powei m \ to the 
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polynomial produces a marked reduction in the residual sum of squares. 
As a time-saver, the orthogonal polynomials are most effective when the 
calculations are done on a desk calculator. With an electronic computer, 
the routine programs for fitting a multiple regression can be used to fit 
the equation in its original form. Most programs also provide the reduc- 
tions in sum of squares due to each successive power. 

Tables of the first five polynomials are given in (5) up to n = 75, and 
of the first six m (6) up to n = 52. Table A 17 (p. 572) shows these poly- 
nomials up to n = 12. For illustration, a polynomial will be fitted to the 
chick embryo data, though, as we saw in section 15.2, these data are more 
aptly fitted as an exponential growth curve. 

Table 15.6.1 shows the weights ( Y) and the values of X x , X 2 , X 3 , X A , 
X 5 for n — 11, read from table A 17. To save space, most tables give the 
X t values only for the upper half of the values of X. In our sample these 
are the values from X = 1 1 to X = 1 6. The method of writing down the X x 
for the lower half of the sample is seen in table 1 5 .6 . 1 . For the terms of odd 
degree, X x , X 3 , and X s , the signs are changed in the lower half; for terms 
of even degree, X 2 and X 4 , the signs remain the same. 


TABLE 1561 

Fitting a Fourth Degree Polynomial to Chick Embryo Weights 

fi - 


Age 

X 

Dry Wt. 

Y 

X x 


*3 

** 

*5 


{days) 

(grams) 








6 

0 029 

-5 

15 

-30 

6 

-3 

0.026 

7 

0.052 

-4 

6 

6 

-6 

6 

0.056 

8 

0.079 

■3 

-1 

22 

-6 

1 

0.086 

9 

0.125 

-2 

-6 

23 

-i 

-4 

0.119 

10 

0.181 

-1 

-9 

14 

4 

— 4 

0.171 

11 

0.261 

0 

-10 

0 

6 

0 

0.265 

12 

0.425 

1 

-9 

-14 

4 

4 

0.434 

13 

0.738 

2 

-6 

-23 

-1 

- 4 

0.718 

14 

1.130 

3 

-1 

-22 

-6 


1.169 

15 

1.882 

4 

6 

-6 

-6 

-6 

1.847 

16 

2 812 

5 

15 

30 

6 

3 

2.822 

T.X, 2 


110 

858 

4,290 

286 

1 156 


K 


1 

1 

5/6 

1/12 

1/40 

— -4 


ZXJ 

7.714 

25.858 

39.768 

31.873 

1.315 

-0.254 


B t 

0.701273 

0.235073 

0.046349 

0.007430 

0.004598 




We shall suppose that the objective is to find the polynomial of lowest 
degree that seems an adequate fit. Consequently, the reduction in sum 
of squares will be tested as each successive term is added. At each stage, 
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calculate 


ZX, Y, J3j = ZXiY/ZX 2 

(shown under table 15.6.1), and the reduction in sum of squares, 
(ZX-, Y) 2 /ZX 2 , entered in table 1 5.6.2. For the linear term, the F-value is 
(6.07851 1)/(0.232177) = 26.2. The succeeding F values for the quadratic 
and cubic terms are even larger, 59.9 and 173.4. For the X 4 (quartic) 
term, F is 10.3, significant at the 5% but not at the 1% level. The 5th 
degree term, however, has an F less than 1. As a precautionary move, 
we should check the 6th degree term also, but for this illustration we will 
stop and conclude that a 4th degree polynomial is a satisfactory fit. 


TABLE 15.6.2 

Reductions in Sum of Squares Due to Successive Terms 



Degrees of 

Sum of 

Mean 


Source 

Freedom 

Squares 

Square 

F 

Total: £(Y- Y) 2 

10 

8.168108 



Reduction to linear 

1 

6.078511 



Deviations from linear 

9 

2.089597 

0.232177 

26.2 

Reduction to quadratic 

i 

1.843233 



Deviations from quadratic 

8 

0.246364 

0.030796 

59.9 

Reduction to cubic 

1 

0.236803 



Deviations from cubic 

7 

0.009561 

0.001366 

173.4 

Reduction to quartic 

I i 

0.006046 



Deviations from quartic 

6 

0.003515 

0.000586 

10.3 

Reduction to quintic 

1 

0.000414 



Deviations from quintic 

5 

0 003101 

0 000620 

07 


For graphing the polynomial, the estimated values Y for each value 
of X are easily computed from table 1 5.6 i : 

f = B 0 + B l X l + B : X 2 + B^X 3 + B 4 X 4 

Note that B 0 = F= 0.701273. At X = 6, 

Y = 0.701273 - 5(0.235073) + 15(0.046349) - 30(0.007430) 

<+ 6(0.004598) = 0.026, 

and so on. Figure 15.6.1 shows the fit by a straight line, obviously poor, 
the 2nd degree polynomial, considerably better, and the 4th degree poly- 
normal. 

To express the polynomial as an equation in the original X variables 
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is more tedious. For this, we need formulas giving X t in terms of X 
and its powers. In the standard method, developed by Fisher, by which 
the polynomial tables were computed, he started with a slightly different 
set of polynomials which satisfy the recurrence relations 


so — 1 


i t = x-x 


*=i+ 1 




i z (n 2 — i 2 ) s 
4(4/ 2 - T ) C ‘~ 1 


These polynomials are orthogonal, but when their values are tabulated 
lor each member ot the sample, these values are not always whole num- 
•bers. Consequently, Fisher found by inspection the multiplier A, which 
would make X, = A,£, the smallest set of integers. This makes calculations 
easier for the user. The values of the A, are shown under table 15.6.1, 
and under each polynomial in table A 17 and in references (5) and (6)! 

Now to the calculations in our example. The first step is to multiply 
each B , by the corresponding A,. This gives 


B \ — 0 235073; £/ = 0.046349; B 3 ' = 0.0.06192; B 4 ' = 0.0003832 


T hese are the coefficients for the regression of Y on the f „ so that 

T= F+ /?,'£,+ *2 'e 2 + B 3 '5 3 1 b 4 u 


(15.6.1) 
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The general equations connecting the with X are as follows : 


It i = X - X = x 


£2 = 


12 

3 n 2 - 7 


20 


^ = X 4 , 3(« 2 — 1)(» 2 — 9) 


14 


560 


= Y 5 


5(n 2 - 7) 3 (15n 4 - 230n 2 + 407) 


18 


■3C J + 


1,008 


By substitution, into formula (15.6.1), ? is expressed as a polynomial in 
x = X — X. If it is satisfactory to stop at this stage, there are two ad- 
vantages. Further calculation is avoided, and there is less loss of decimal 
accuracy. However, to complete the example, we note that n = 11 and 
X = 1 1. Hence, in terms of X, 

ii = x- 11 

s 2 = (X - ll) 2 - 10 = X 2 - 22X + 111 

S 3 = (X - ll) 3 - 17.8(2: - 11) = X 3 - 33X 2 + 345.22T - 1,135.2 
u = (X - ll) 4 - 25(X - ll) 2 + 72 

= X 4 - 44 AT 3 + 7012f 2 - 4,774AT + 11,688 

Hence, finally, using formula (15.6.1), 

?= 0.701273 + 0.235073^, + 0.046349£ 2 + 0.006 192£ 3 + 0.0003832^ 
= 0.701273 + 0.235073(2: - 11) + 0.046349(2f 2 - 22X + 111) 

' + 0.006192(2: 3 - 33AT 2 + 345.22: - 1,135.2) 

+ 0.0003832(2: 4 ~ 44AT 3 + 701A' 2 - 4,774X + 11,688) 

= 0.7099 - 0.476522: + 0.110636AT 2 - 0.010669F 3 + 0.0003832X 4 

In table 15.6.1 there is a further shortcut which we did not use. In 
computing Z2f, F, the F’s at the two ends of the sample, say Y n and F,, 
are multiplied by 5 and -5. F„_, and F 2 are multiplied by 4 and —4. 
If we form the differences, Y„ — F,, F„_, — F 2 , and so on, only the set of 
multipliers 5, 4, 3, 2, 1 , need be used. This device works for any Y.X l Y in 
which i is odd. With i even, we form the sums F„ + F,,' F„_, + F 2 , and 
so on. The method is worked out for these data in example 15.6.1. 

EXAMPLE 15 6.1 — tn table 15 6 1, form the sums and differences of pairs of values of 
working in from the outside Verify that these give the results shown below, and that 
the Iff values are in agreement with those given in table 15.6 1. 
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Sums 

*2 

*4 

Diffs. 


*3 

0.261 

-10 

6 

0.261 

0 

0 

0.606 

- 9 

4 

0.244 

1 

-14 

0.863 

- 6 

-1 

0.613 

2 

-23 

1.209 

- 1 

-6 

1.051 

3 

-22 

1.934 

6 

-6 

1.830 

4 

- 6 

2.841 

15 

6 

2.783 

5 

30 


EXAMPLE 15.6.2 — Here are six points on the cubic, Y—9X- 6X 2 + X 3 . (0, 0), 
(1, 4), (2, 2), (3, 0), (4, 4), (5, 20). Carry through the computations for fitting a linear, 
quadratic, and cubic regression. Verify that there is no residual sum of squares after fitting 
the cubic, and that the polynomial values at that stage are exactly the T$. 

EXAMPLE 15.6.3 — The method of constructing orthogonal polynomials can be illus- 
trated by finding X l and X 2 when n — 6. 


(i) 

(2) 

(3) 

(4) 

(5) 

X 

IX 

1 

X 

II 

*0*. 


{* 


1 

-5/2 

-5 

10/3 

5 

2 

-3/2 

-3 

-2/3 

-1 

3 

- 1/2 

-1 

-8/3 

-4 

4 

1/2 l 

1 

-8/3 

-4 

5 

3/2 

3 

-2/3 

-1 

6 

5/2 

5 

10/3 

L 1 

5 


Start with X = 1, 2, 3, 4, 5, 6, with X = 7/2. Verify that the values of = x = X - X are 
as shown in column (2). Since the are not whole numbers, we take = 2, giving X { = 2£ 
column (3). To find £ 2 , write 

£ 2 - <^ 2 - Mi - c 

This is a quadratic in X. We want ££ 2 — 0 This gives 

E^ 2 - - nc - 0 : i.e.,^-6c = 0 : c = f} 

Further, we want ££iC 2 = 0, giving 

E^-hX^ 2 -cXf t =0 . i.e.,hE£ 1 2 = 0 : h = 0 

Hence, £ 2 = £i 2 ~ if- Verify the £ 2 values in column (4). To convert these to integers, 
multiply by X 2 — f. 


15.7 — A general method of fitting non-linear regressions. Suppose 
that the population relation between Y and X is of the form 

Y t =/(«, p, y, X,) + £l (i =1,2,... n) 

where / is a regression function containing X , and the parameters a, )?, y. 
(There may be more than one X-variable.) If the residuals s, have zero 
means and constant variance, the least squares method of fitting the regres- 
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sion is to estimate the values of the a, /?, y by minimizing 

I [V- /(aJty.Xi )] 2 
1= 1 

This section presents a general method of carrying out the calculations. 
The details require a knowledge of partial differentiation, but the ap- 
proach is a simple one. 

The difficulty arises not because of non-linearity in X t but because of 
non-linearity in one or more of the parameters a, /?, 7. The parabola 
(a -f fiX + yX 2 ) is fitted by the ordinary methods of multiple linear re- 
gression, because it is linear in a, /?, and y. Consider the asymptotic regres- 
sion, a — fi(y x ). If the value of 7 were known in advance, we could write 
X x = y x . The least squares estimates of a and ji would then be given b\ 
fitting an ordinary linear regression of Y on X { . When 7 must be estimated 
from .the data, however, the methods of linear regression cannot be ap- 
plied. 

The first step in the general method is to obtain good initial estimates 
a j, />!, c j, of the final least-square estimates 2, /?, 7. For the common 
types of non-linear functions, various techniques for doing this have been 
developed, sometimes graphical, sometimes by special studies of this prob- 
lem. Next, we use Taylor’s theorem. This states that if /(a, /?, 7, X) 
is continuous in a, /i, and 7, and if (a — a x ), (/^ - h x ), and (7 — c,) are 
small, * 

/(a, /i, 7, X { ) ==/(</!, t‘i. A,) •+■ {y. ~ u\f a + (ft — b x )f h + (7 fi)/ 

The symbol == means “is approximately equal to.” The symbols /,/,,/ 
denote the partial derivatives of/ with respect to a, /^, and 7, respectively, 
evaluated at the point a x < />,, c x . For example, in the asymptotic regres- 
sion, 

/(a, /?, 7, X { ) = a - P(y x ‘) 

we have 

./« = Jb = ./, - — k x x' i (c 1 x ' ~ l ) 

Since a l% h ,, and c x are known, the values of / / a , /, and / can be 
calculated for each member of the sample, where we have written /for 
/(<?!, b x , Cj, A',), From Taylor's theorem, the original regression relation 

>W( a. /7 

may therefore be written, approximately, 

y, {a _ + ((j - h x )f b + (7 - cq)/ + ( 15 . 7 . 1 ) 

Now wTite 


Yre,= Y -/: *1 = /«; X 3 =/ 
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From equation 15.7.1, 

Yres = (« - a x )X 1 + 0? - W 2 + (? - *0*3 + Si (15.7.2) 

The variate is the residual of Y from the first approximation. The 
relation (15.7.2) represents an ordinary linear regression of Y res on the 
variates X x , X 2 , X 3 , the regression coefficients being (a — a x ), (ft — b t ) 
and (y — c x ). If the relation (1 5.7.2) held exactly instead of approximately, 
the computation of the sample regression of Y res on X u X 2 > X 3 would 
give the regression coefficients (St - a x ), (ft — b x ), and (f — c x ), from 
which the correct least squares estimates St, ft, and $ would be obtained 
at once. 

Since relation (15.7.2) is approximate, the fitting of this regression 
yields second approximations a 2 , b 2 , and c 2 to St, ft, % respectively. We 
then recalculate/, j a ,f b and f c at the point a 2 , b 2 , c 2 , finding a new Y res 
and new variates X x , X 2 , and X 3 . The sample regression of this Y res on 
X x , X 2 , and X 3 gives the regression coefficients (<z 3 — a 2 ), (b 3 — b 2 ) and 
(c 3 — c 2 ) from which third approximation. a 3 , b 3 , c 3 to St, ft, $ are found, 
and so on. 

If the process is effective, the sum of squares of the residuals, £ Y res 2 , 
should decrease steadily at each stage, the decreases becoming small 
as the least-squares solution is approached. In practice, the calculations 
are stopped when the decrease in £ Y res 2 and the changes in a , b, and c are 
considered small enough to be negligible. The mean square residual is 

^ = ^Y r j!(n - k), 

where k is the number of parameters that have been estimated (in our 
example, k = 3). With non-linear regression, s 2 is not an unbiased esti- 
mate of or 2 , though it tends to become unbiased as n becomes large. 

Approximate standard errors of the estimates St, ft, f are obtained in 
the usual way from the Gauss multipliers in the final multiple regression 
that was computed. Thus, 

s.e. (&) = «V c u ; s.e . (ft) = sjc 22 ; s.e. (?) = sjc 33 

Approximate confidence limits for a are given by (St ± tSy/c lx ) where t 
has (n — 3) d.f. 

If several stages in the approximation are required, the calculations 
become tedious on a desk machine, since a multiple regression must be 
worked out at each stage. With the commonest non-linear relations, 
however, the computations lend themselves readily to programming on an 
electronic computer. Investigators with access to a computing center 
are advised to find out whether a program is available or can be con- 
structed. If the work must be done on a desk machine, the importance of 
a good first approximation is obvious. 

15.8 — Fitting an asymptotic regression. The population regression 
function will be written (using the symbol p in place of y) 
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/(a, /I, p, X) = a + p(p x ) (15.8.1) 

If 0 < p < i and /? is negative, this curve has the form shown m figure 
15.1.1(c), p. 448, rising from the value (a + /?) at X = 0 to the asymptote 
a as X becomes large. If 0 < p < 1 and /? is positive, the curve declines 
from the value (a 4- /?) at X = 0 to an asymptote a when X is large. 

Since the function is non-linear only as regards the parameter p, the 
method of successive approximation described in the preceding section 
simplifies a little. Let r t be a first approximation to p. By Taylor's 
theorem, 

a + /}(p x ) = a + fi(r l x ) + P(p - r^Ar,* -1 ) 

Write X 0 — 1, X x = r x x % X 2 = Xr x ~ l . If we fit the sample regression 

f = aX 0 + bX A +cX 2 (15.8.2) 

it follows that a , b are second approximations to the least-squares esti- 
mates /?, of a and jS in (15.8.1), while 

c = b{r 2 - #■*), 

so that 


r 2 = r l + c/b (15.8.3) 

is the second approximation to p. 

The commonest case is that in which the values of X change by unity 
(e.g., X = 0, 1 , 2 . . . or X = 5 , 6, 7 . . . ) or can be coded to do so. Denote 
the corresponding F values by *0. Y» Y 2 ,... , F„_ Note that the value 
of X corresponding to F 0 need not be 0. For n = 4, 5, 6, and 7, good first 
approximations to p, due to Patterson (7), are as follows: 

* * 4. r, = (4F 3 + F 2 - 5 Y l )/(4Y 2 +Y { - 5 Y 0 ) 

n = 5. r x - (4F 4 + 3F 3 ~ Y 2 - 6F t )/(4F 3 + 3Y 2 - F, -6 Y 0 ) 

n = 6. r, = (4F 5 + 4h 4 4- 2Y, - 3Y 2 - 7F I )/(4F 4 + 4F 3 + 2F 2 - 3 Y x 

Y 0 ) 

r x - (F 6 + Y 5 + F 4 - F 2 - 2F t )/(F 5 + F 4 + F 3 - Y x - 2F 0 ) 

In a later paper (8), Patterson gives improved first approximations for 
sample sizes from n = 4 to n = 12. The value of r u obtained by solving 
a quadratic equation, is remarkably good in our experience. 

In an illustration given by Stevens (9), table 15.8.1 shows six consecu- 
tive readings of a thermometer at half-minute intervals after lowering it 
into a refrigerated hold. 

From Patterson’s formula (above) for n = 6, we find r x = 10.42/ 
— 18 86 — 0.552. Taking / x = 0.55, compute the sample values of X x and 
\ 2 and insert them m table 15.8.1. The matrix of sums of squares and 
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where a ,, h l are given by the linear regression of Fon^. In the pre- 
ceding calculations, a x and h x were not computed, since they are not 
needed tn finding the second approximation. However, by the usual 
rules for linear regression, 2 Y res 2 from the first approximation is given by 

I Y 1 - (I Yf/n ~(Z.}\x l ) 2 /Z.x l \ (15.8.7) 

where, as usual, \ x = X x - X { . When the curve fits closely, as in this 
example, ample decimals must be carried in this calculation, as Stevens 
(9) has warned. Alternatively, we can compute a x and h { in (15.8.6) and 
hence Y - Y { , obtaining the residual sum of squares directly. With the 
number of decimals that we carried, we obtained 0.0988 by formula 1 5.8.7 
and 0.0990 by the direct method, the former figure being the more accurate. 

For the second approximation, compute the powers of r, = 0.55187, 
and hence find ? 2 by (15.8.5). The values of Y z and of Y - Y 2 are shown 
in table 1 5.8. 1 . The sum of squares of residuals is 0.0973. The decrease 
from the first approximation (0.0988 to 0.0973) is so small that we may 
safely stop with the second approximation. Further approximations 
lead to a minimum of 0.0972. 

The Residual mean square for the second approximation is 
v 2 = 0.0973 3 = 0.0324. with n - 3 = 3 d.j. Approximate standard 
errors for the estimated parameters are (using the inverse matrix): 

.s\c. (</*) “ s \! c \ \ ~ ±0.23; s.e.(b 2 ) = Syjc 2 2 = ±0.26; 
s.i\ ir 2 ) = s x c'lJb 2 = 0.226/26.82 = ±0.0084 

Strictly speaking, the values of the c u should be calculated for r = 0.55187 
instead of r =■ 0.55, but the above results are close enough. Further, since 
r 2 — r x = c(b , a better approximation to the standard error of r 2 is given 
by the formula for the standard error of a ratio. 



In nearly all cases^the term c 33 / c 2 in the square root dominates, reducing 
the result to .s\ c 3i fb. 

When \' has the values 0, 1, 2, ...,(« - 1), desk machine calculation 
of the second approximation is much shortened by auxiliary tables. 
The c n and c tJ in the 3x3 inverse matrix that we must compute at each 
stage depend only on n and Stevens (9) tabulated these values for 
n — 5, 6, 7. With these tables, the user finds the first approximation fj, 
and computes the sample values of X\ and X 2 and the quantities 21', 
2A"j T, 2 A 2 Y. The values of the c tJ corresponding to r x are then read 
from Stevens" tables, and the second approximations are obtained rapidly 
as m (15.8.4) above. Hiorns (10) has tabulated the inverse matrix for r 
going by 0.01 from 0.1 to 0.9 and for sample sizes from 5 to 50. 
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EXAMPLE 15.8.1 — In an experiment on wheat in Australia, fertilizers were applied at 
a series of levels with these resulting yields. 


Level 

X 

0 

10 

20 

30 

40 

Yield 

Y 

26.2 

30.4 

36.3 

37.8 

38.6 


Fit a Mitscherlich equation. Ans. Patterson’s formula gives r x * 0.40. The second 
approximation is r 2 ~ 0.40026, but the residual sum of squares is practically the same as for 
the first approximation, which is?= 38.679 - 12.425(0.4)*. 


EXAMPLE 15.8.2 — In a chemical reaction, the amount of nitrogen pentoxide decom- 
posed at various times after the start of the reaction was as follows (12). 






★ CHAPTER SIXTEEN 


T 

JLwo-way classifications with 
unequal numbers and 
proportions 


16.1 — Introduction. For one reason or another the numbers 'of 
observations m the individual cells (sub-classes) of a multiple classifica- 
tion may be unequal. This is the situation in many non-experimental 
studies, in which the investigator classifies his sample according to the 
factors or variables of interest, exercising no control over the way in which 
the numbers fail. With a one-way classification, the handling of the “un- 
equal numbers” case was discussed in section 10.12. In this chapter we 
present methods for analyzing a two-way classification. The related 
problem of analyzing a proportion in a two-way table will be taken up 
also. 

The complications introduced by unequal sub-class numbers can be 
illustrated by a simple example. Two diets were compared on samples of 
1 0 rats. As it happened, 8 of the 1 0 rats on Diet 1 were females, while only 
2 of the 10 rats on Diet 2 were females. Table 16.1.1 shows the sub-class 
totals for gains in weight and the sub-class numbers. The 8 females on 
Diet 1 gained a total of 1 60 units, and so on. 


TABLE 16 11 

Total Gains in Weight and Sub-class Numbfrs {Artificial Data) 



Females 

Males 

Sums 

Means 

Diet 1 

1 Totals 

160 

60 

220 

22 

[ Numbers 

8 

2 

10 j 

Diet 2 

( Totals 

30 



23 

(Numbers 

2 

8 



Sums 

(Totals 

190 

260 

450 


| Numbers 

10 

10 

20 


Means 


19 

26 


22 5 


From these data we obtain the row totals and means, and likewise the 
column totals and means. From the row means, it looks as if Diet 2 had 
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a slight advantage over Diet 1 , 23 against 22. In the column means, males 
show greater gains than females, 26 against 19. 

The sub-class means per rat tell a different story. 



Female 

Male 

Diet 1 

20 

30 

Diet 2 

| 15 

25 


Diet 1 is superior by 5 units in both Females and Males. Further, 
Males gain 10 units more than Females under both diets, as against the 
estimate of 7 units obtained from the overall means. 

Why do the row and column means give distorted results? Clearly, 
because of the inequality in the sub-class numbers. The poorer feed, 
Diet 2, had an excess of the faster-growing males. Similarly, the compari- 
son of Male and Female means is biased because most of the males were 
on the inferior diet. 

If we attempt to compute the analysis of variance by elementary meth- 
ods, this also runs into difficulty. From table 16.1.1 the sum of squares 
between sub-classes is correctly computed as 

(160) 2 , (60) 2 , (30) 2 , (200) 2 (450) 2 f 

= 325 (3 d.f.) 

The sum of squares for Diets, (230 — 220) 2 /20, is 5, and that for Sex 
(260 — 190) 2 /20, is 245, leaving an Interaction sum of squares of 75. But 
from the cell means there is obviously no interaction ; the difference be- 
tween the Diet means is the same for Males as for Females. In a correct 
analysis, the Interaction sum of squares should be zero. 

For a correct analysis of a two-way table the following approach 
is suggested : 

1 . First test for interactions : methods of doing this will be described 
presently. 

2a. If interactions appear negligible, this means that an additive 
model 

X ij. = fi + cCj 4- fa + 

is a satisfactory fit, where X i} . is the mean of the n i} observations in the 
zth row andjth column. Proceed to find the best unbiased estimates of 
the oc f and fa. 

2b. If interactions are substantial, examine the row effects separately 
in each column, and vice versa, with a view to understanding the nature of 
the interactions and writing a summary of the results. The overall row 
and column effects become of less interest, since the effect of each factor 
depends on the level of the other factor. 

Unfortunately, with unequal cell numbers the exact test of the null 
hypothesis that interactions are absent requires the solution of a set of 
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linear equations like those m a multiple regression Consequently, before 
presenting the exact test (section 1 6 7) we first describe some quicker meth- 
ods that are often adequate When interactions are large, this fact may 
be obvious by inspection, or can sometimes be verified by one or two 
/-tests, as illustrated in section 16 2. Also, the exact test can be made by 
simple methods if the cell numbers n u are (l) equal, (n) equal within an> 
row or within an> column, or (in) proportional that is, in the same pro- 
portion withm any row If the actual cell numbers can be approximated 
reasonably well by one of these cases, an approximate analysis is obtained 
by using the actual cell means, but replacing the cell numbers n l} by the 
approximations. The three cases will be illustrated m turn in sections 
16 2, 16 3, and 16 4. 

The fact that elemental methods ot analysis still apply when the 
the cell numbers are proportional is illustrated in table 16 12 In this, 
the cell means are exactly the same as in table 16 11, but males and females 
are now in the ratio 1 3 in each diet, there being 4 males and 12 females 
on Diet 1 and 1 male and 3 females on Diet 2 Note that the overall row 
means show a superiority of 5 units for Diet 1 . just as the cell means do. 


TAB! t 16 1 2 

Exampi r or Prohor iionai Slb-c i ass Numbers 



Analysis of Variance 
Correction term C = (430) 2 /20 = 9,245 


Between sub-classes 


Degrees of Freedom Sum of Squares 

Rows 

j (360) 2 (70) 2 

16 ' 4 

= 80 

Columns 

, (145) 2 (285) 2 „ 

5 15 

= 375 

Interactions 

1 Bv subtraction 

= 0 


( 120) 2 


(45) 2 

+ ~r 


C = 455 


4 
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Similarly, the overall column means show that the males gamed 10 units 
more per animal than females. In the analysis of variance, the Interac- 
tions sum of squares is now identically zero. 

16.2 — Unweighted analysis of cell means. Let X ijk denote the kth 
observation in the cell that is in the fth row and jth column, while Xy. is 
the cell mean, based on n tJ observations. In this method the X tJ . are 
treated as if they were all based on the same number of observations when 
computing the analysis of variance. The only new feature is how to in- 
clude the Within-cells mean square s 2 = - 1) 

in the analysis of variance. 

With fixed effects, the general model for a two-way classification may 
be written 

Kjk = f* + <*i + Pj + Iij + (16.2.1) 

where <x x and fij are the additive row and column effects, respectively. 
The I XJ are population parameters representing the interactions. The I i} 
sum to zero over any row and over any column, since they measure the 
extent to which the additive row and column effects fail to fit the data in 
the body of the two-way table. The e ijk are independent random residuals 
or deviations, usually assumed to*' be normally distributed with zero 
means and variance a 2 . It follows from 16.2. 1 that for a cell mean, 

%ij- = fi + a, 4- Pj + Iij + e, r , 

where s ir is the mean of n XJ deviations. 

The variance of X LJ . is <j 2 I n XJ . Consequently, if there are a rows and 
b columns, the average variance of a cell mean is 

o 1 f 1 1 1 \ __ <r 2 

ab[n n n l2 **’ n ab ) n K 

where n h is known m mathematics as the harmonic mean of the n l} . A 
table of reciprocals helps m its calculation. The Within-cell mean square 
is entered m the analysis of variance as s 2 

Our examole (table 16.2.1) comes from an experiment (1) in which 
3 strains of mice were inoculated with 3 isolations (i.e., different types) of 
the mouse typhoid orgamsm. The n XJ and the X xy (mean days-to-death) 
are shown for each cell. The unweighted analysis of variance is given 
under the table. From the original data, not shown here, s 2 is 5.015 with 
774 d.f. Since 1 jn h was found to be 0.01 678, the Withm-cells mean square 
is entered as (0.01678)(5.015) = 0.0841 in the analysis of variance table. 

The unweighted analysis may be used either as the definitive analysis, 
or merely as a quick initial test for interactions. As a final analysis the 
unweighted method is adequate only if the disparity m the n XJ is small-— 
say within a 2 to 1 ratio with most cells agreeing more closely . Table 16.2.1 
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TABLE 16.2.1 

Cell Numbers and Mean Days-to-Death in Three Strains oe Mice Inocli ated 
With Three Isolations of the Typhoid Bacillus 


Isolation 

! 

Strain of Mice 

Sums 

RI 

Z 

Ba 

9D n„ ! 

34 

31 

33 



4.0000 

4 0323 

3.7576 

11.7899 

nc 

66 

78 

113 



6.4545 

6 7821 

4 3097 

17.5463 

DSC 1 

107 

133 

188 


! 

6.6262 

7.8045 

4.1277 

18.5584 

Sums 

17.0807 

18.6189 

12.1950 

47.8946 


Analysis ol Variance of Unweighted Means 



Degrees of Freedom 

Sum of Squares Mean Square 

Isolations 


2 

8.8859 

Strains 


2 

7.5003 

Interactions 


4 

3.2014 0.8004** 

Withm cells 


774 

0.0841 


1 

i/i i 

9\34 + ■ + m) 

1 = 0.01678 n h = 59 61 


does not come near to meeting this restriction: the n tJ range from 31 to 188. 
However, this experiment is one in which the presence of interactions 
would be suspected from a preliminary glance at the data. It looks as if 
strain Ba was about equally susceptible to all three isolations, while 
strains RI and Z were more resistant to isolations 1 1C and DSC1 than to 
9D. In this example the unweighted analysis would probably be used 
only to check this initial impression that an additive model does not apply. 
The F-ratio for Interactions is 0.8004/0.0841 = 9.51 with 4 and 774 <£/., 
significant at the 1% level. Since the additive model is rejected, no com- 
parisons among row and column means seem appropriate. 

For subsequent Mests that are made to aid the interpretation of the 
results, the method of unweighted means, if applied strictly, regards every 
cell mean as having an error variance 0.0841 . This amounts to assuming 
that every cell has a sample size n h = 59.61. However, comparisons 
among cell means can be made without assuming the numbers to be equal. 
For instance, in examining whether strain Z is more resistant to DSC1 
than to 11C, the difference m mean days-to-death is 7.8045 — 6.7821 
= 1 .0224, with standard error 


(5.015) 


78 


_1 

131 


= ±0.319 
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so that the difference is clearly significant by a /-test. Similarly, in testing 
whether Ba shows any differences from strain to strain in mean days to 
death, we have a one-way classification with unequal numbers per class 
(see example 16.2.1). 

If interactions had been negligible, main effects would be estimated 
approximately Irom the row and column means of the sub-class means. 
These means can also be assigned correct standard errors. For instance, 
for 9D the mean, 1 1 .7899/3 = 3.9300, has a standard error 


poTs) / 1 ~~ _f _T\ 

V 9 \34 + 31 + 33 ) 

In some applications it is suspected that the Within-sub-class vari- 
ance is not constant trom one sub-class to another. Two changes in the 
approximate method are suggested. In the analysis of variance, compute 
the Within-classes mean square as the average of the quantities s ij 2 ln ir 
whereas,/ is the mean square within the /, j sub-class. In a comparison 
£L I; X iy among the sub-class means, compute the standard error as 

using only the sub-classes that enter into the comparison. 

EXAMPLE 16.2 1 •— Te^i whether Ba shows any differences from strain to strain m 
mean days-to-death Ans The Ba totals are 124, 487, 776. for sample sizes 33, 113, 188. 
The weighted sum of squares is 8.0565, with 2 d.f. The mean square. 4 028, as compared with 
the Withm-class mean square. 5 015, shows no indication of any difference 


16.3 — Equal numbers within rows. In the mice example (table 1 6.2. 1 ), 
an analysis that assumes equal sub-class numbers within each row approxi- 
mates the actual numbers much more closely than the assumption that 
all numbers are equal. Since the row total numbers are 98, 257, and 428, 
we assign sample sizes 33, 86, and 143 to the sub-classes in the respective 
rows. 

In the analysis (table 16.3.1), each sub-class mean is multiplied by the 
assigned sub-class number to form a corresponding sub-class total . Thus, 
for Z with 9D, 133.1 = (33)(4.0323). The analysis of variance, given 
under table 16.3.1, is computed by elementary methods. Each total, 
when squared, is divided by the assigned sample size. 

The F-ratio for Interactions is 8.70, again rejecting the hypothesis of 
additivity of Isolation and Strain effects. In this example, the assigned 
numbers agree nearly enorgh with the actual numbers so that further t- 
tests may be based on the assigned numbers. If the interactions had been 
unimportant in this example, the main effects of Isolations and Strains 
would be satisfactorily estimated from the overall means 3.930, 5.849, 
and so on, shown in table 16.3. 1 . (These means were not used in the pres- 
ent calculations.) 
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row and dividing by the sum of the sub-class numbers in the row, are the 
least squares estimates of the row main effects, and similarly for columns. 

(ii) In computing the analysis of variance, the squared total for any 
sub-class, row, or column is divided by the corresponding number. The 
Total sum of squares between sub-classes and the sums of squares for 
rows and columns are calculated directly, the Interaction sum of squares 
being found by subtraction. 

(iii) The F-ratio ot the Interactions mean square to the Withm sub- 
classes mean square gives the exact least squares test of the null hvpothesis 
that there are no interactions. 

Two examples will be presented. In table 16.4.1 the classes are Breeds 
ol Swine and Sex ot Swine. The sub-class numbers represent approxi 
mately the proportions in which the breeds and sexes were brought in foi 
slaughter at the C ollege Meats Laboratory (2). For each breed, males and 
females are in the proportions 2:1, and for each sex, the breeds are in the 
proportions 6:15:2:3:5 The data are the percentages of dressed weight 
tc total weight (less 70° o ). The calculations are given in full under the 
table. Since the sample represents only a small fraction of the original 
data, conclusions are tentative. There were differences among breeds but 
no indication ot a sex difference nor of sex-breed interactions In making 
comparisons among the breed means, account should ot course be taken 
of the differences in the sample sizes. 

In the breed means, the sexes are weighted in the ratio of 2 males to 1 
female. The reader may ask : Is this the weighting that we ought to have*’ 
The answer depends on the status of the interactions. If interactions are 
negligible, any weighting, provided that it is the same tor every breed, 
furnishes unbiased estimates of the population differences between bleed 
means. The 2 : 1 weighting gives the most precise estimates from the avail- 
able data. If interactions are present, breed differences are not the same 
for males as for females, so that different weightings produce real differ- 
ences in results. Usually, as emphasized on several occasions, we do not 
wish to examine main effects when interactions are present. If we do, a 
2:1 weighting is appropriate, when interactions are present, only if it 
represents the proportions in which males and females appear in the 
target population of the study, as happens in this example' Equal 
weighting or some other proportion would be preferred if it were more 
typical of the population about which the investigator wishes to draw 
conclusions. 

With unequal sub-class numbers the expressions for the expected 
values of the mean squares in terms ot components of variance are com- 
plicated. Wilk and Kempthorne (3) have developed formulas for 2- and 
3-factor arrangements: the sub-class numbers may be equal or proportion- 
al. With 2 factors, let the proportions in factor A be u l :u 2 : . . . :u a 
and those in F, r x :v 2 : . . . :v b . The number of observations in the (/, /) 
sub-class will then be some multiple of w t r ; , say m/,r r Note the value of 
n. The mathematical model is as given m 16.2.1, where the a r 0, and 



TABLE 16.4.1 

Dressing Percentages (Less 70%) of 93 Swine Classified by Breed and Sex 
Live Weights 200-219 Pounds 


Number 

Breed 

1 

2 

3 

4 

5 

Male Female 

Male Female 

Male Female 

Male Female 

Male Female 

1 

13.3 18.2 

10.9 14.3 

13.6 12.9 

11.6 13.8 

10.3 12.8 

2 

12.6 11.3 

3.3 15.3 

13.1 14.4 

13.2 14.4 

10.3 8.4 

3 

11.5 14.2 

10.5 11.8 

4.1 

12.6 4.9 

10.1 10.6 

4 

15.4 15.9 

11.6 11.0 

10.8 

15.2 

6.9 13.9 

5 

12.7 12.9 

15.4 10.9 


14.7 

13.2 10.0 

6 

15.7 15 1 

14.4 10 5 


12.4 

11.0 

7 

13.2 

11.6 12.9 



12.2 

8 

15.0 

14.4 12.5 



13.3 

9 

14.3 

7.5 13.0 



12.9 

10 

16.5 

10.8 7.6 



9.9 

11 

15.0 

10.5 12.9 




12 

13.7 

14.5 12.4 




13 


10.9 12.8 




14 


13.0 10.9 




15 


15.9 13 9 




16 


12.8 




17 


14.0 




18 


11.1 




19 


12.1 




20 


14.7 




21 


12.7 




22 


13.1 




23 


10.4 




24 


11.9 




25 


10.7 




26 


14.4 




27 


11.3 




28 


13.0 




29 


12.7 




30 


12.6 




IX 

168 9 87 6 

362 7 182.7 

41 6 27 3 

79 7 33.1 

110.1 55 7 


Total. N = 93, 2.X — 1,149 4, 'LX 2 = 14,785.62 

Breed Sums* 1 , 256.5, 2, 545 4, 3, 68 9, 4, 1 12.8, 5, 165 8 

Sex Sums: Male, 763 0, Female, 386 4 


1 

2 

3 

4 

5 

6 


Correction C = (LX) 2 /# = (1,149 4) 2 /93 = 14,205 60 
Total Z.X 2 - C = 14,785 62 - 14,205 60 = 58002 

. (55 7) 2 


Sub-cl asses <^ + ^ + 

12 6 5 

Within sub-classes 580 02 - 122 83 = 457 19 

Sex <W + (3|4) ! _c = o5 2 

Breeds * + + - C = 97 38 

18 15 


- C = 12283 


7 Interaction t ? > jn — ten la n — ia cn 
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Degrees of Freedom Sum of Squares Mean Square 

Sex 

Breeds 

Breed-Sex Interactions 
Within sub-classes 

1 0.52 0.52 

4 97.38 24.34** 

4 24.93 6./J 

83 457.19 5.51 


Breed Mean Percentages 


2 3 4 5 


84.2 82.1 81.5 ‘82 5 81.1 

18 45 6 9 15 


I l} may be either fixed or random. Also let : 


U = Xu, 


F = Xu, 


[/* 


Xu 2 

( I «) 2 


The expected values of the mean squares are : 


V* = 


Xr 2 

(Xu ) 2 


E(A) — a 2 
E(B) = a 2 
E(AB) = a 2 


+ 


nUV( 1 - U*) 
a - 1 

nUV(l - F*) 
b - 1 


v * ~ 1 K « 2 + °a 2 


U * ~ - )^B 2 + 


, nDT(l - l/*)(l - F*) 2 

+ (a- m - 1 ) ffAB 


These results hold when both factors are fixed. If A is random, delete the 
term ml /a (inside the curly bracket) in E(B). If B is random, delete the 
term in l/b in E(A). With fixed factors, the variance components are 
defined as follows: 

a a 2 = Xa t 2 /(a - 1) : <x * 2 = Xj 8//(6 - 1) : = X/ 0 2 /(a - 1 )(b - 1) 


For the example, if A denotes sex and B denotes breed: 
a = 2, b = 5; u x = 2, u 2 = 1 ; v l = 6 , t ? 2 = 15, v 3 = 2, i ? 4 = 3, v s = 5; n = 1 

92 , 12 52 , 1 52 

£/ = 3; F = 31 ; t/* = = 0.556; F* = — — = 0.311 

Regarding sex and breed as fixed parameters, we find 
E(A) = <r 2 + 4.58(T j4B 2 + 41.3cr 4 2 
E(B) — o 2 + 0.90<t^ b 2 + 1 6 .Oct B 
£(AB) = <r 2 + 7 . 110 -^s 2 
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Note that E(A) and E(B) contain terms in the interaction variance, even 
though all effects are fixed. This happens because when the numbers are 
proportional, the main effects are weighted means. Although the 1 XJ sum 
to zero over any row or column, their weighted means are not zero. As 
a further illustration, you may verify that if A were random in these data, 
we would have: 

E(B) = <7 2 4- S.90a AB 2 + 16.0<t b 2 

Our second example (table 16.4.2) illustrates the use of analysis by 
proportional numbers as an approximation to the least squares analysis. 
In a sample survey of farm tenancy in an Iowa county (4), it was found that 
farmers had about the same proportions of Owned, Rented, and Mixed 


TABLE 16.4 2 

Farm Acres in Corn Classified by Tfnure and Soil Productivity 
Audubon County, Iowa 


Soil 

Class 

! 

i 

Owner | 

Renter 

Mixed 

Ln 

LA' 

l 

! 

! 

Ob- 

served 

Propor- 
tional { 

Ob- 

served 

Propor- 

tional 

Ob- 

served 

Propor- 

tional 

I 

n 

36 

36.75 

67 

62.92 

49 

52.33 

152 



X 

32.7 


55.2 


50.6 





LA' 


1,202 


3,473 


JO 

b> 

oo 


7,323 


n 

31 

33 85 

60 

57.95 

49 

48.20 

140 


II 

1 A 

36.0 


53 4 


47 1 





LA' 


1,219 

| 


3,095 


2,270 

| 

6,584 


I 

n 

| 58 

54 40 1 

87 

93.13 

80 

77.47 

> 225 


III 

X 

> 30 1 

1 

46.8 


! 40 1 





i 2A 

| 

I 

1,637 1 


4,358 

j _ 

3,107 


9,102 



125 


214 


' 178 


517 


LA 

i 

I 

1 

4,058 


10,926 

! 

8,025 


23.009 


.Analysis of Variance Using Proportional Numbers 

Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Soils 

2 

6,635 

3,318* 

Tenures 

-> 

27,367 

13,684** 

Interactions 

4 

883 

221 

Error ( from original data ) 

508 


830 

Means 

Owner 

Renter 


Mixed 

32 5 

51 I 


45.1 
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farms in 3 soil fertility classes (section 9. 1 3). Replacement of the actual 
sub-class numbers by numbers that are proportional should therefore 
give a good approximation to the least squares dialysis. The proportional 
numbers are calculated from the row and column totals of the actual num- 
bers. Thus, for Renters in Soil Class III, 93.13 = (225)(214)/517. The 
sub-class means are multiplied by these fictitious numbers to produce the 
sub-class totals 1LX in table 16.4.2. 

The variable being analyzed is the number of acres of corn per farm. 
There are large differences between tenure means, renters and mixed 
owner-renter farmers having more corn than owners. The amount of 
corn is also reduced on Soil Class III. There is no evidence of interactions. 
Since the proportional numbers agree so well with the actual numbers, 
an exact least squares analysis in these data is unnecessary. In general, 
analysis by proportional numbers should be an adequate approximation 
to the leas+squares analysis if the ratios of the proportional to the actual 
cell numbers all lie between 0.75 and L3, although this question has not 
been thoroughly studied. 

16.5 — Disproportionate numbers. The 2x2 table. In section 16.7 
the analysis of the R x C table when sub-class numbers are neither equal 
nor proportional will be presented. The 2x2 and the R x 2 table, which 
are simpler to handle and occur frequently, are discussed in this and the 
next section. Table 16.5.1 gives an example (5). The data relate to the 
effects of two hormones on the comb weights of chicks. 


TABLE 16.5.1 

Comb Weights (mg.) of Lots of Chicks Injected With Two Sex Hormones 



Number 

Untreated 

IX 

£ 

Number 

Hormone A 
ZX 

X 

Untreated 

3 

240 

80 

12 

1,440 

120 

Hormone B 

12 

1,200 

100 

6 

672 

112 


The Within-classes mean square, computed from the individual 
observations, was s 2 = 811, with 29 d.f. To test the interaction, compute 
it from the sub-class means in the usual way for a 2 x 2 factorial : 

80 +112 - 100 - 120 - -28 


Taking account of the sub-class numbers, the standard error of this esti- 
mate is 



111 
— -j- — ■ -j- — — + 

3 6 12 




±23.25 


The value of t is - 28/23.25 = - 1 .20, with 29 d.f.. P about 0.25. We shall 
assume interaction unimportant and proceed to compute the main effects 
(table 16.5.2). 
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TABLE 16 5.2 

Calculation of Main Effects of Hormones A and B 


2& 



Untreated 

n x X 0 

Hormone A 

n i 

t* £ 

i n 

n i«2 

w A D, 

*1 +»2 

Untreated 

3 

80 

12 

120 

40 

2.4 

96 

Hormone B 

12 

100 

6 

112 

12 

4.0 

48 



D, 

W B 

D b 


64 

144 


2.4 

20 

4.0 

— 8 





Main effect of A: £ W A D A fZ W A == 144/6.4 = 22.5 


S E. * 7“ \,/8 11/6.4 = ±11.26 (29 </./.) 

Mam effect of B- £ ^ B D B /£ » 16/6.4 = 2.5 

S.£. - = v / 8H/6J= ±11.26 (29 <*/.) 


Consider Hormone A. The differences D A between the means with 
and without A are recorded separately for the two levels of B. These are 
the figures 40 and 12. Since interaction is assumed absent, each figure 
is an estimate of the main effect of A. But the estimates differ in precision 
because of the unequal sub-class numbers. For an estimate derived from 
two sub-classes with numbers n t and n 2 the variance is 


a 


2 




ffl (»i + n 2 ) 

n x n 2 


Consequently, the estimate receives a relative weight W — n 1 n 2 /(n 1 -f n 2 ). 
These weights are computed and recorded. The main effect of A is the 
weighted mean of the two estimates, 'LWD/'LW, with s.e. ± ypFfLW. 
The main effect of B is computed similarly. The increase in comb weights 
due to Hormone A is 22.5 mg. ± 11.26 mg., almost significant at the 5% 
level, but Hormone B appears to have little effect. 

Note : in this example the two values of W\ 2.4 and 4.0, happen to be 
the same for A and B. This arises because two sub-classes are of size 
12 and is not generally true We have not described the analysis of vari- 
ance because it is not needed. 

16.6 — Disproportionate numbers. The 1x2 table. The data m table 
16.6.1 illustrate some of the pecuhanties of disproportionate sub-class 
numbers (6) In a preliminary analysis of variance, shown under the 
table, the Total sum of squares between sub-class means and the sums of 
squares for Sexes and Generations were computed by the usual elementary 
methods (taking account of the differences m sub-class numbers). The 
Interactions sum of squares was then found to be 


119,141 - 1 14,287 - 5,756 = -902 
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The Sexes and Generations S.S. add to more than the total S.S. between 
sub-classes. This is because differences between the Generation means 
are inflated by the inequality in the Sex means, and vice versa. 

TABLE 16.6.1 

Number, Total Gain, and Mean Gain in Weight of Wistar Rats (Gms. Minus 100) 
in Four Successive Generations. Gains During Six Weeks From 28 Days of Age 







w,** D, 


Genera- 

i 

Male Female 


n u n 2{ 


tion 

- — rj 



n 2 j X 2] 

j 

n h + n 2 , X u -Xv Wfij 

1 ! 

21 

1,616 76 95 27 257 

9 52 

11 81 

67 43 79635 

2 | 

15 

922 

: 61 47 25 352 

14 08 

9 38 

47 39 444.52 

3 

12 

668 

: 55 67 23 196 

8 52 : 

7 89 

47 15 372 01 

4 

7 

497 71 00 19 129 

6.79 | 

5 12 

64 21 328 76 






34 20 

L 

1,941 64 




Preliminary Analysis of Variance 


Source of Variation 1 Degrees of Freedom 

Sum of Squares 

Mean Square 

Sexes 


I 

1 

114,287 


Generations 

i 

3 


5,756 


Interactions 

I 

3 


— 902( ! ) 


Between sub-classes 

7 

119,141 


Within sub-classes 

141 



409 

Calculation of Adjusted Generation Means 

Generation j 


X j. X.j. 

Estimate of 

Adjusted Mean 

1 

1 

48 

1,873 39.02 

p 4- ot 

i - me 

42.57 

2 


40 

1,274 31.85 

p + a 2 ~ <5/8 

38.93 

3 

1 

35 

864 24.69 

/i + a 3 - 11 <5/70 

33.61 

4 


26 

626 24.08 

p + a 

4 ~ 3<5'13 1 

37 18 


In any R x 2 table the correct Interactions S.S. is easily computed 
directly. Calculate the observed sex difference D and its weight W 
separately for each generation (table 16.6.1). The Interactions S.S. 
(3 d.f.) is given by 

ZWD 2 - (Z WD) 2 /!. W = (67.43)(796.35) + . . . + (64.21 )(328.76) 

- ( 1 , 941 . 64 ) 2 /( 34 . 20 ) - 3,181 


The F-test of Interactions is F= 1,060/409 = 2.59, close to the 5°/ 
level. It looks as if the sex difference was greater m generations 1 and 
than in generations 2 and 3. There is, however, no a priori reason to antici- 
pate that the sex difference would change from generation to generation. 
Perhaps the cell means were affected by some extraneous sources of varia- 


4^o 
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tion that did not contribute to the variation within cells For illustration, 
we proceed to estimate main effects on the assumption that interactions 
are negligible. 

The estimate of the sex difference in mean gain is 

D = ZWjDj/IWj = 1,941.64/34 20 = 56.77 gms. 

Its S.E. is s/syzw = 7409/34.2 = 3 46 gms 

To estimate the Generation effects, note that under the additive model 
the population means for males and females in Generation j may be written 
as follows 


Males. + Females: \i + a 3 — 


where 5 represents the sex difference. Males minus Females. We start 
with the unadjusted mean for each generation and adjust it so as to re- 
move the sex effect. Since generation 1 has 21 males and 27 females out 
of 48, its unadjusted mean is an unbiased estimate of 


4- ocj + 


2\fd\ , 27 
48\ 2 


+ 


48 




Our estimate of S is 56.77 and the unadjusted mean for generation 1 is 
39.02 To remove the sex effect, we add 56.77/16 = 3.55, giving 42.57. 
These adjustments are made at the foot of table 16.6.1. 

For comparisons among these adjusted generation means, standard 
errors may be needed. The difference between the adjusted means of the 
jth and ktb generation is of the form 

X-j - X. k . + gD n 

where g is the numerical multiplier of D The variance of this difference is 

1 1 
— 4- — + 
n j n k 

With generations 1 and 2, n , = 48, a? , = 40, while g = (-1/16) - (-1/8) 
= 1/16, and Z W = 34 2 The term m g in the variance turns out to be 
negligible The variance of the difference is therefore 





= 18 73 


The adjusted difference is 3 62 ± 4 33 

If F-tests of the main effects of Sexes and Generations are wanted, 
start with the preliminary S S for each factor m table 16.6.1. Subtract 
from it the difference: 

Correct Interaction S.S minus Preliminary Interaction S.S. 

= 3,181 - (-902) = 4,083 
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Verify the completed analysis of variance quoted from the original article: 

Sex (adjusted) 

Strains (adjusted) 

Interaction 
Within sub-classes* 

* You cannot, of course, verify this line. 

You will not be able to duplicate these numbers exactly because the means are reported to 
only 3 significant digits. Your results should approximate the first 3 figures m the mean 
squares, enough for testing. 

16.7 — The R x C table. Least squares analysis. This is a general 
method for analyzing 2-way classifications (7). It fits an additive model 
(i.e., one assuming no interactions) to the sub-class means: 

x u . - n + a, + Pj + 8y., i = 1, . . . r,j = 1, . . . c, 

where the . are assumed normally distributed with means zero and vari- 
ances a 2 /n ij, where n tj is the sub-class number. This amounts to assuming 
that the variance within each subclass is a 2 , since e ijt is the mean of « 0 * 
such residuals. 

As an intermediate step in the calculations, the method provides the 
most powerful test of the null hypothesis that interactions are zero. If 
this hypothesis is contradicted by the data, the calculations are usually 
stopped and the investigator proceeds to examine the two-way table in 
detail. If the assumption of negligible interactions is tenable, the remain- 
der of the calculations give unbiased estimates of the row and column 
main effects oq and that have the smallest variances. Since data of this 
type are common and are tedious to handle on a desk machine, most com- 
puting centers are likely to have a standard program for the analysis. 

The basic data used are the n ti and the row (X t . .) and column (Y. r ) 
totals of the observations. Table 16.7.1 shows the algebraic notation and 
the mouse typhoid data of table 16.2.1 used as illustration. (The p tJ 
are explained later.) Following Yates (7), we denote the row and column 
totals of the n tJ by N t . and N. r 

The least squares method chooses estimates m, a v b 2 of ju, a„ /£, that 
minimize 

ZI>.j (X iy - m- a, - bf 

* J 

The resulting normal equation for a t is 

N t .(m 4- aj 4- n ll b 1 4- n a b 2 + . . . 4- n lc b c = X t .. (16.7.1) 
Thus, for Organism 9 D, we have 

98 (m + aj 4- 34 b x 4- 31h 2 4- 335 3 = 385 

Note that the least squares method estimates a n the effect of the zth row, 
by making the observed total for the zth row equal the value which the 
model says it ought to have. Similarly, for the yth column, 

N j(m 4- bj 4- n tJ a t + n 2j a 2 4- ... 4- n rj a r = X. r (16.7.2) 


1 2,594.6 2,594.6 

4 417,565 6 104,391.4 

4 8,805 3 2,201.3 

109 332,962.9 3,054.7 
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TABLE 16.7.1 

Algebraic Notation and Data for Fitting the Additive Model 



1 

Columns 

2 

C 

Totals 

Data 

Totals 


n u 


»11 

«12 


A). 

X x ■ 


Plj = 

nJN,. 

Pll 

Pll 

Pic 

I 



n 2j 


n 21 

n 22 

n 2 c 

N z . 

x 2 . . 


Plj - 

n 2j/^2' 

Pll 

Pii 

Pic 

1 



n rj 


«rl 

” r2 


N, 

x r . 


Prj = 

n rj /N r . 

Prl 

Prl 

Prc 

1 





N. . 

N. z 

N. c 

N . 



Data totals 


x. 2 . 



X .. 





Strain of Mice 





Organism 

RI 

Z 

Ba 

N r 

X,.. 

x, - 

9D 

n u 

34 

31 

33 

98 

385 

3.929 


Plj 

0.34694 

0.31633 

0.33673 

1 



11C 

*2J ! 

66 

78 

113 

257 

1,442 

5.611 


p„ 

0.25681 

0.30350 

0.43969 

1 



DSC1 n 3 , \ 

107 

133 

188 

428 

2,523 

5 895 


Plj 

0.25000 

0.31075 

0.43925 

1 




N.j 

207 

242 

334 

783 




X 

1,271 

1,692 

1,387 


4,350 

5 556 


bj 

2.1251 

2.8986 

0 





From (16.7.1) we see that if we know the V s, we can find (m -F a t ), while 
if we know the d s, (16.7.2) gives (m 4- b). The next step is to eliminate 
either the d s or the V s. Time is usually saved by eliminating the more 
numerous set of constants, though an investigator interested only in the 
rows may prefer to eliminate the columns . In this example, with r = c = 3, 
it makes no difference. We shall eliminate the d s (rows). 

When the d s are eliminated, m also disappears. In finding the equa- 
tions for the Z?’s, it helps to divide each n tj by its row total N v forming the 
p iy The equations for the ds are derived by a rule that is easily remem- 
bered. The first equation is 

(JV.1 - H n p u - - n rl p rl )b 1 - (n u p 12 + .. + n rl p r2 )b 2 - ... 

~ + •• + n rlPrcA = X. 2 .-PuXj,.. ~ ,.~p rl X r . 

For the mice, the first equation is 

[207 - (34)(0.34694) - - ( 107) (0.25000)] b 1 

- [(34)(0.31633) + .. + (107)(0.31075)]fa 2 

- [(34) (0.33673) + .. + (107)(0.43925)]i> 3 

= 1,271 - (0.34694)(385) - - (0.25000)(2,523) 

In the jth equation the term in bj is N. : minus the sum of products of the 
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n’s and p’s in that column. The term in b k is minus the sum of products 
of the Hij and the p ik . The three equations are; 

151.505b, - 64.036b 2 - 87.468b 3 - 136.35 

— 64.036b, + 167.1916 2 - 103.155b 3 = 348.54 (16.7.2a) 

-87.468b, - 103.155b 2 + 190.624b 3 = -484.92 

The sum of the numbers in each of the four columns above adds to zero, 
apart from rounding errors. This is a useful check. 

In previous analyses of 2-way tables in this book, we have usually 
assumed Zb, = 0. In solving these equations it is easier to assume b 3 = 0. 
(This gives exactly the same results for any comparison among the b’s.) 
Drop b 3 from the first two equations and drop the third equation, solving 
the equations 

151.505 b, - 64.036b 2 = 136.35 
— 64.036b, + 167.191b 2 = 348.54 
The inverse of the 2 x 2 matrix (section 13.4) is 

/0.0Q78753 0.0030163\ 

VO.0O3O163 0.0071365/ 

giving 

b, = 2.1251 : b 2 = 2.8986 : b 3 = 0 

The sum of squares for columns, adjusted for rows, is given by the sum 
of products of the fa’s with the right sides of equations (16.7.2a). 

Column S.S. (adjusted) » (2.1251)(136.35) 4- (2.8986) (348.54) = 1,300 

The analysis of variance can now be completed and the Interactions 
S.S. tested. Compute the S.S. Between sub-classes and the unadjusted 
S.S. for Rows and Columns, these being, respectively, 

Z 'LX tr 2 !n lJ - C; ZJT,.. 2 /^ . - C; Z X.//N.J - C; 

where C = X... 2 /N... The results are shown in the top half of table 
16.7.2. 

In the completed analysis of variance, the S.S. Between sub-classes, 
1,786, can be partitioned either into 

Rows S.S. (unadjusted) + Columns S.S. (adjusted) + Interactions S.S. 
or into 

Rows S S. (adjusted) -f Columns S.S (unadjusted) + Interactions S.S. 

Since we now know that Rows S.S. (unadjusted) = 309 and Columns S.S. 
(adjusted) = 1,300, the first of these relations gives the Interactions S.S. as 


(16.7.3) 

(16.7.4) 
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1,786 — 309 — 1,300 = 177 

The d.f. are (r — l)(c - 1) = 4 in this example. The second relation pro- 
vides the Rows S.S. (adjusted). The completed analysis of variance ap- 
pears in the lower half of table 16.7.2. 


TABLE 16.7 2 

Analysis of Variance of the Mice Data 


Preliminary (Unadjusted) 


Source of Variation 

Degrees of 
Freedom 

Sum of Squares 

Mean Square 

Between sub-classes 

8 

1,786 


Rows (Organisms), unadjusted 

2 

309 


Columns (Strains), unadjusted 

2 

1,227 


Completed 

Rows (Organisms), unadjusted 

2 

ijS } 1 - 609 


Columns (Strains), adjusted 

2 

650 0 

Rows (Organisms), adjusted 

2 

ijS } 1 ’ 609 

191.0 

Columns (Strains), unadjusted 

2 


Interactions 

4 

177 

44.2 

Within sub-classes 

774 


5.015 


As in the approximate analyses, interactions are shown to be present 
so that ordinarily the analysis would not be carried further; the data 
would be interpreted as in section 16.2. But to illustrate the computations 
we proceed as though there were no interaction. The mean squares for 
F-tests of the main effects of rows and columns are the adjusted mean 
squares m table 16.7.2. 

The standard error of any comparison among the column 

mam effects is 

s^ r (Z.L J 2 c J] +2LLjL k c Jk ) 

where s = ^5.015 = 2.24 and the c ]k are the inverse multipliers in (16.7.3). 
Since b 3 was arbitrarily made 0, all c j3 are 0. As examples, 

S.E.(by - b 2 ) = 2.24V [-00788 + - 00714 ~ 2(.00302)] = ±0.212 
S.E.(b 1 - b 3 ) = 2.24V .00788 = ±0.199 

The row main effects can be obtained from (16.7.1), rewritten as 

m + a, = X t .. -Pnbt - . . . - p lc b c (16.7.5) 

In table 16.7.1, the X, are in the right-hand column and the bj are at the 
foot of each column. Relation (16.7.5) gives 

m + a, = 3.929 - (0.34694)(2 1251) - (0.31633)(2.8986) = 2.275 
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Similarly, we find 

tn a 2 = 4.186 w 4- a 3 = 4.463 

From (16.7.5) any comparison XZ^m + a,) among the row means is 
of the form 

X L&.. - 

To find the variance of this comparison, multiply s 2 by 

For example, the difference a 2 - = 1.911, is 

^ 2 ..-J , 1 .. + 0.09016 1 +a0128fc 2 
The multiplier of s 2 is, therefore, 

1 1 
98 + 257 

+ (0.0901) 2 (0.00788) + (0.0128) 2 (0.00714) 

+ 2(0.0901)(0.0128)(0.00302) = 0.01417 

The s.e. is ± V(5.015)(0 01417) = ±0.266. 

In the original data the overall mean is X... = 4350/783 = 5.556 
(table 16.7.1). Our three estimated row means are all less than 5.556. 
This is a consequence of the choice of b$ = 0 to simplify the arithmetic. 
Although this choice has no effect on any comparison among the row 
means m + a t or the column means m + b p it is sometimes desirable to 
adjust the rn + a t and the m + bj so that m becomes 5\... To do this, 
calculate the weighted mean of the m 4 a x with weights N r ; that is, 

[(98)(2.275) + (257) (4.1 86) 4- (428)(4.463)]/783 = 4.098 

Since X... — 5.556, we add 4 1,458 to each m 4- giving 3.733, 5.644, 
and 5.921 for the row means. To make the column means average in 
the same way to the general mean, compute these means as X... 4- b 3 
- 1 .458, giving the values 6.223, 6.997, 4.098. 

In a 3-way classification the exact methods naturally become more 
complicated. There are now three two-factor interactions and 1 three- 
factor interaction. An example worked in detail is given by Stevens (8). 

The exact analysis of variance can still be computed by elementary 
methods if the sub-class numbers are proportional, that is, if 

n ijk = (X\..)(N. J .)(N.. k )/N... 2 

Federer and Zelen (9) present exact methods for computing the sum 
of squares due to any main effect or interaction, assuming all other effects 
present. They also describe simpler methods that provide close upper 
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bounds to these sums of squares. Their methods are valid for any number 
of classes. 

EXAMPLE 16.7.1 — In the farm tenancy example in section 16.4 there was no evidence 
of interaction. The following are the least squares estimates of the mam effect means for 
tenure and soils. 


Owner: 32.507 

Renter: 

51.072 

Mixed: 

45.031 

I : 48.157 

II : 

46.999 

III : 

40.480 


Your results may differ a little, depending on the number of decimals carried. The results 
above were adjusted so that I A*, a* ~ IN.jbj = 0. Note the excellent agreement given by the 
means shown under table 16.4.2 for the method of proportional numbers 

EXAMPLE 16.7.2— In the mice data verify the following estimates and standard 
errors as given by the use of equal weights within rows (section 16.3) and the least squares 
analysis (section 16.7). 



Equal Within Rows 

Least Squares 

11C-9D 

1.919 + 0.265 

1.911+0.266 

Z-Rl 

0.755 + 0.196 

0 774 + 0.212 


16.8 — The analysis of proportions in 2-way tables. In chapter 9 we 
discussed methods of analysis for a binomial proportion. Sections 9.8- 
9.11 dealt with a set of C proportions arranged in a one-way classification. 
Two-way tables in which the entry in every cell is a sample proportion 
are also common. Examples are sample survey results giving the per- 
centage of voters who stated their intention to vote Democratic, classified 
by the age and income level of the voter, or a study of the proportion of 
patients with blood group O in a large hospital, classified by sex and type 
of illness. 

The data consist of rc independent values of a binomial proportion 
Pij = Sij/ n ip arranged in r rows and c columns. The data resemble those 
in the preceding section, but instead of having a sample of continuous 
measurements X ljk (k= 1,2,... riij) in the z, j cell, we have a binomial 
proportion p tJ . The questions of interest are usually the same in the bi- 
nomial and the continuous cases. We want to examine whether row and 
column effects are additive, and if so, to estimate them and make com- 
parisons among rows and among columns. If interactions are present, 
the nature of the interactions is studied. 

From the viewpoint of theory, the analysis of proportions presents 
more difficulties than that of normally distributed continuous variables. 
Few exact results are available. The approximate methods used in prac- 
tice mostly depend on one of the following approaches. 

1. Regard p tJ as a normally distributed variable with variance 
Pij^ijhip using the weighted methods of analysis in preceding sections, 
with weights w tJ = njp XJ q l} and p tJ replacing X iy . 
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2. Transform thepy to equivalent angles y { j (section 11.16), and treat 
the y tJ as normally distributed. Since the variance of y tJ for any p tj is 
approximately 821 jn^ this method has the advantage that if the n tJ are 
constant, the analysis of variance of the y {j is unweighted. As we have 
seen, this transformation is frequently used in randomized blocks experi- 
ments in which the measurement is a proportion. 

3. Transform p i} to its logit Y i} = log,, (pjq tj ). The estimated vari- 
ance of Yij is approximately 1 /(n,/Py?y), so that in a logit analysis, Y tJ 
is given a weight 

The assumptions involved in these approaches probably introduce 
little error in the conclusions if the observed numbers of successes and 
failures, n tj p l3 and n i} q tj , exceed 20 in every cell. Various small-sample 
adjustments have been prepared to extend the validity of the methods. 

When all p tJ lie between 25% and 75%, the results given by the three 
approaches seldom differ materially. If, however, the p tJ cover a wide 
range from close to zero up to 50% or beyond, there are reasons for ex- 
pecting that row and column effects are more likely to be additive on a 
logit scale than on the original p scale. To repeat an example cited in sec- 
tion 9.14, suppose that the data are the proportions of cases in which the 
driver of the car suffered injury in automobile accidents classified by 
severity of impact (rows) and by whether the driver wore a seat belt or not 
(columns). Under very mild impacts p is likely to be close to zero for both 
wearers and non-wearers, with little if any difference between the two 
columns. At the other end, under extreme impacts, p will be near 100% 
whether a seat belt was worn or not, with again a small column effect. 
The beneficial effect of the belts, if any, will be revealed by the accidents 
that show intermediate proportions of injuries. The situation is familiar 
in biological assay in which the toxic or protective effects of different agents 
are being compared. It is well known that two agents cannot be effec- 
tively compared at concentrations for which p is close to zero or 100%; 
instead, the investigator aims at concentrations yielding p around 50%. 

Thus, m the scale of p, row and column effects cannot be strictly addi- 
tive over the whole range. The logit transformation pulls out the scale 
near 0 and 100%, so that the scale extends from -oo to +x. In the 
logit analysis row and column effects may be additive, whereas in the p 
scale for the same data we might have interactions that are entirely a con- 
sequence of the scale. The angular transformation occupies an inter- 
mediate position. As with logits, the scale is stretched at the ends, but the 
total range remains finite, from 0 to 90\ 

To summarize, with an analysis in the original scale it is easier to 
think about the meaning and practical importance of effects m this scale. 
The advantage of angles is the simplicity of the computations if the n tJ are 
equal or nearly so. Logits may permit an additive model to be used in 
tables showing large effects. In succeeding sections some examples will 
be given to illustrate the computations for analyses in the original and the 
logit scales. 
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The preceding analyses utilize observed weights , the weight W = njpq 
attached to the proportion p in a cell being computed from the observed 
value of Instead, when fitting the additive model we could use expected 
weights W = n/pQ, where p is the estimate given by the additive model. 
This approach involves a process of successive approximations. We guess 
first approximations to the weights and fit the model, obtaining second 
approximations to the p. From these, the weights are recomputed and the 
model fitted again, giving third approximations to the p and the W, and 
so on until no appreciable change occurs in the results. 

This series of calculations may be shown to give successive approxi- 
mations, to maximum likelihood estimates of the p ( 11 ). When np and 
nq are large in the cells, analyses by observed and expected weights agree 
closely. In small samples it is not yet clear that either method has a con- 
sistent advantage. Observed weights require less computation. 

A word of caution : we are assuming that in any cell there is a single 
binomial proportion. Sometimes the data in a cell come from several 
binomials with different p's. In a study of absenteeism among clerical 
workers, classified by age and sex, the basic measurement might be the 
proportion of working days in a year on which the employee was absent. 
But the data in a cell, e.g., men aged 20-25, might come from 18 different 
men who fall into this cell. In this event the basic variable is p ljk , the pro- 
portion of days absent for the Ath man in the Uj cell. Usually it is ade- 
quate to regard p ljk as a continuous variate, performing the analysis 
by the methods in preceding sections. 

16.9 — Analysis In the p scale: a 2 x 2 table. In this and the next 
section, two examples are given to illustrate situations in which direct 
analysis of the proportions is satisfactory. Table 16.9.1 shows data cited 
by Bartlett (12) from an experiment in which root cuttings of plum trees 
were planted as a means of propagation. The factors are length of cutting 
(rows) and whether planting was done at once or in spring (columns). 

TABLF 16 9 1 

Percentages of Surviving Plum Root-stocks From 240 Cuttings 


Length of Time of Planting 

Cutting At Once Spring 


Long 

156/240 = 65 0% 

p 12 = 84/240 =* 35.0% 


r u = (65 0)(35 0)/240 = 9 48 

v i2 = (35.0)(65.0)/240 = 9 48 

Short 

/> 21 = 107 240 = 44 6",, 

p 22 = 31/240 =12 9% 

: 

; 21 = (44.6K55 4)/240 = 10.30 

v 22 ~ {12 9)(87.1)/240 = 4.68 


In the (1, 1) cell, 156 plants survived out of 240, giving p n = 65.0%. 
The estimated variances v for each p are also shown. __ 

The analysis resembles that of section 16.5, the p tJ replacing the X tJ . . 
To test for interaction we compute 
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Pu 4* P 22 “ Pu — Pi 1 = 65.0 + 12.9 - 35.0 - 44.6 = —1.7% 

Its standard error is 

7+^22 + V \2 + V 2l) = V 33 ' 94 = ± 5 * 83 

Since there is no indication of interaction, the calculation of row and col- 
umn effects proceeds as in table 1 6.9.2. For the column difference in row 
1, the variance is (t> t i + i'u)** 1 8.96. The overall column difference is a 
weighted mean of the differences in the two rows, weighted inversely as 
the estimated variances. Both main effects are large relative to their stan- 
dard errors. Clearly, long cuttings planted at once have the best survival 
rate. 


TABLE 16.9.2 

Calculation of Row and Column Effects 


— : 

i 

At Once 

Spring 

D 

V 

w 

Long 

Pu = 65.0 

i? u = 9.48 

Pix = 35.0 

» 12 “ 9.48 

30 0 

18.96 

0.0527 

Short 

Pu = 44.6 

v 2i = 10.30 

Pu = 12.9 

v n - 4.68 

31 7 

14.98 

0.0668 


0 = 204 

K= 19.78 

D as 22.1 

V= 14.16 

i 




I 

W~ 0.0506 


W= 0.0706 





Mam Effects: 

At Once - Spring : 1 WDjZ W~ 31 .0% : S.E. = 1 lj{ I W) = ± 2.89 
Long - Short E WD/2 W *= 21 A% * S.E. «* \!j(Z W) = ± 2 87 


In Bartlett’s original paper (12), these data were used to demon- 
strate how to test for interaction in the logit scale. (He regarded the 
data as a 2 x 2 x 2 contingency table and was testing the three-factor 
interaction among the factors alive-dead, long-short, at once-spring.) 
However, the data show no sign of interaction either in the p or the logit 
scale. 

16.10 — Analysis In the p scale: a 3 x 2 table. In the second example, 
inspection of the individual proportions indicates interactions that are 
due to the nature of the factors and would not be removed by a logit 
transformation The data are the proportions of children with emotional 
problems in a study of family medical care ( 1 3), classified by the number of 
children m the family and b> whether both, one. or no parents were re- 
corded as having emotional problems, as shown in table 16.10.1. 

In families having one or no parents with emotional problems the 
four .alues of/? are close to 0.3, any differences being easily accountable 
by sampling errors. Thus there is no sign of an effect of number of children 
or of the parents’ status when neither or only one parent has emotional 
problems When both parents have problems there Is a marked increase 
m p in the smaller families to 0.579 and a modest increase in the larger 



Proportions 


TABLE 16.10.1 

Proportion of Children With Emotional Problems 
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Number of Children in Family 


Parents With j 

Problems 

1-2 

f * 

! 3-4 

Both 

p = 33/57 = 0.579 

p « 15/38 = 0.395 


One 

/> = 18/54 = 0.333 

p * 17/55 = 0.309 


None 

p = 10/37 = 0.270 

9/32 = 0.281 



familes to 0.395. Thus, inspection suggests that the proportion of chil- 
dren with emotional problems is increased when both parents have prob- 
lems, and that this increase is reduced in the larger families. 

Consequently, the statistical analysis would probably involve little 
more than tests of the differences (0.579 - 0.333), (0.395 - 0.309), and 
(0.579 — 0.395), which require no new methods. The first difference is 
significant at the 5% level but the other two are not, so that any conclusions 
must be tentative. In data of this type nothing seems to be gained by 
transformation to logits. Reference (13) presents additional data bearing 
on the scientific issue. 

16.11 — Analysis of logits in an R x C table. When the fitting of an 
additive model in the logit scale is appropriate, the following procedure 
should be an adequate approximation : 

1 . If p is a binomial proportion obtained from g successes out of n 
trials in a typical cell of the 2-way table, calculate the logit as 

Y = ln{(g -F l/2)/(n — g + 1/2)} 

in each cell, where In denotes the log to base e. 

2. Assign to the logit a weight W = (g + l/2)(n - g + l/2)/(n + 1). 
In large samples, with all values of g and (n - g) exceeding 30, Y will be 
essentially In ( p/q ) and the weight npq , which may be used if preferred. The 
values suggested here for Y and W in small samples are based on research 
by Gart and Zweifel (14). See example 16.12.3. 

3. Then follow the method of fitting described for continuous data 
in section 16.7, with Y tJ in place of X iy and with W XJ in place of n XJ as 
weights. Section 16.7 should be studied carefully. 

4. The analysis of variance of Y is like table 1 6.7.2, but has no “With- 
in sub-classes” line. If the additive model fits, the Interactions sum of 
squares is distributed approximately as x 2 ( r l)(c- 1) d.f % A 
significant value of x 2 is a sign that the model does not fit. This test should 
be supplemented by inspection of the deviations Y l} - ? xj to note any syste- 
matic pattern that suggests that the model distorts the data. 

5. If the model fits and the inverse multipliers c tJ have been computed 
for the columns, the s.e. of any linear function of the column main effects is 

\J(ZLj~ 4* 2 lLjL k c jk ) 
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In the numerical example which follows, the proportions p are all 
small, the largest being 0.056. In this event, the logit of p is practically 
the same as In (p). In effect, we are fitting an additive model to the loga- 
rithms of the p’s, i.e., a multiplicative model to thep’s themselves. Further, 
with large samples the observed weight W = npq becomes W - np = g 
when p is small, each logit being weighted by the numerator of p. 

16.12 — Numerical example. The data come from a large study of the 
relationship between smoking and death rates (15). About 248,000 male 
policyholders of U.S. Government Life Insurance answered questions 
by mail about their smoking habits. The data examined here are for men 
who reported themselves as non-smokers and for men who reported that 
they smoked cigarettes only. The cigarette smokers are classified by num- 
ber smoked per day, 1-9, 10-20, 21-39, and over 39. For each smoking 
class, the person-years of exposure were accumulated by 10-year age 
classes, using actual ages. That is, a man aged 52 on entry into the study 
would contribute 3 years in the 45-54 age class and additional years in 
the 55-64 age class. Most men were in the study for 8} years. 

In table 16.12.1, part (A) shows for each cell the number of deaths. 
Part (B) gives the annual probability of death (x 10 3 ) within each cell, 
calculated from the n-mber of deaths and the number 6f person-years of 
exposure. Since the age distributions of different smoking classes were 
not identical within a 10-year age class, the probabilities were computed, 
by standard actuarial methods, so as to remove any effect of these dif- 
ferences in age-distributions. 


TABLE 16.12.1 

Numbers of Deaths and Annual Probabilities of Death ( x 10 3 ) 


Age Reported Number of Cigarettes Smoked Per Day 


(Years) 

1 None 

1-9 

10-20 

21-39 

Over 39 

35-44 

i 47 

7 

(A) number of deaths 
90 

83 

10 

45-54 

1 38 

11 

67 

80 

14 

55-64 

1 2,617 

389 

2,117 

1,656 

406 

65-74 

1 3,728 

586 

2,458 

1,416 

258 

35-44 

1 

| 1 27 

(B) annual probabilities of death ( x 10 3 ) 

1 63 1.99 2.66 

3 26 

45-54 

! 2 64 

6.23 

6.64 

8 91 

11 60 

55-64 

10.56 

14.35 

18 50 

20 87 

27 40 

65-74 

! 24.11 

35.76 

42 26 

49.40 

55 91 


In every age group the probability of death rises sharply with each 
additional amount smoked. As expected, the probability also increases 
consistently with age within every smoking class. It is of interest to exam- 
ine whether the rate of increase m probability of death for each additional 
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amount smoked is constant at the different ages or changes from age to age. 
If the rate of increase is constant, this implies a simple multiplicative model 
for row and column effects: apart from sampling errors, the probability 
p ,j for men in the ith age class and jih smoking class is of the form 

Pa = XiPj 

In natural logs this gives the additive model 
In (/?,;•) = In a ; + In 

Before attempting to fit this model it may be well to compute for 
each age group the ratio of the smoker to the non-smoker probabilities of 
death (table 16.12.2) to see if the data seem to follow the model. 


TABLE 16.12.2 

Ratios of Smoker to Non-Smoker Pr labilities of Death 


Reported .Number Smoked Per Day 


Age 

li\ 

1-9 

10-20 

21-39 

Over 39 

35-44 

*\y 

1.28 

1.57 

2.09 

2.57 

45-54 


2.36 

2.51 

3 37 

4.39 

55-64 


1.36 

175 

l 98 

2.59 

65-74 


1.48 

1.75 

2.05 

2 32 


The ratios agree fairly well for age groups 35-44, 55-64, and 65-74, 
but run substantially higher in age group 45-54. This comparison is an 
imprecise one, however, since the probabilities that provide the denomi- 
nators for the ages 35-44 and 45-54 are based on only 47 and 38 deaths, 
respectively. A stabler comparison is given by finding in each row the 
simple average of the five probabilities and using this as denominator for 
the row. This comparison (example 16.12.1) indicates that the non- 
smoker probability of death may have been unusually low in the age 
group 45-54. 

Omitting the multiplier 10\ the p values in table 16.12.1 range from 
0.00127 to 0.05591. The assumption that these p ' s are binomial is not 
strictly correct. Within an individual cell the probability of dying presum- 
ably varies somewhat from man to man. This variation makes the vari- 
ance of p for the cell less than the binomial variance (see example 16.12.2), 
but with small p’s the difference is likely to be negligible. Further, as 
already mentioned, the p’s were adjusted in order to remove any differ- 
ence in age distribution within a 10-year class. Assuming thep’s binomial, 
each In p is weighted by the observed number of deaths in the cell, as 
pointed out at the end of the preceding section. 

The model is 


Y, j = ju + a, + fa + c,j 
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where the e u arc independent with means zero and variances 1 jW iy The 
fitted model is 


= m + a, + bj, 

the parameters being chosen so as to minimize E W(Y — Y) 2 . 

TABLE 16.12.3 

Arrangement of Data for Fitting an Additive Model 


— 7 

Age 


Reported Number of Cigarettes Per Day 

None 1-9 10-20' 21-39 Over 39 

• ■ - - - 

35-44 

w l$ * 

47 

7 

90 

83 

10 

237 = W v 


r u t 

0 239 

0 489 

0.688 

0 978 

1.182 

169.570 = Y t . 

45-54' 


38 

11 

67 

80 

14 

210 

i 

Y 2 J 

0.971 

1 829 

1.893 

2.187 

2.451 

393.122 

55-64 j 

w it 

2.617 

389 

2,117 

1,656 

406 

7,185 


r 3j 

2.357 

2.664 

2.918 

3.038 

3.310 

19,756 759 

65-74 

K, 

3,728 

586 

2,458 

1,416 

258 

8,446 


> 4 , 

3.183 

3.577 

3.744 

3.900 

4.024 

29,725.690 


Wj 

6,430 

993 

4,732 

3,235 

688 

16,078 = 


Y, 

2.812 

3 178 

3.290 

3.341 

3 529 

50,045.141= Y 

3 1126 = F 


* W l} sa cell weight = number of deaths, 
t Y X} = 

Table 16.12.3 shows the weights W XJ = number of deaths and the 
Y tJ - In (pij). The first step is to find the row and column totals of the 
weights, and the weighted row and column totals of the Y w namely 

W. =£fV % : r t . = Z W V Y„ ■ Y. J = Y, W t] Y i} : 

W" = IK : Y.. = £ Yi. 

I * l 

If we m£ke the usual restrictions, 

IW.+ -IW- A = o> 

* J 

then m is the overall mean Y /W = Y =3.1126. Analogous to (16.7.1) 
and (16.7.2), the normal equations for a , and bj are 

W A™ 4 - a x ) + W ll b i 4 - W l2 b 2 4 - . . . W lc h c = Y,. (16.12.1) 

W.j(m 4 - b } ) 4 - W Xj a v 4 - W 2j a 2 + . . . + W ?J b r =- Y. } (16.12.2) 

Since we are not interested in attaching standard errors to the a, or 
b p these equations will be solved directly by successive approximations. 
As first approximations to the quantities (m 4 - b 3 ) we use the observed 
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column means T.^ = Y.j/W. p shown in table 16.12.3. Rewriting equation 
(16.12.1) in the form 

W r (m + a ,) = Y t . + W..Y.. - W n (m + b 1 )-...~ W k (m + b c ) 

we obtain second approximations to the (m + a,). For row 1, 

237(w + a t ) = 169.570 + (237){3.1 126) - (47)(2.812) - ... - (10)(3.529) 
(m + a t ) = 144.153/237 = 0.608 

These are then inserted in (16.12.2) in the form 

W.j(m + bj) = Y.j + W.j?.. - W Vj (m + aj) W r fm + a T ) 

and so on. The estimates settle down quickly. After three rounds the 
following estimates were obtained : 


Ages 

35-44 

45-54 

55-64 

65-74 


m A a, 

0.5748 

1.7130 

2.7193 

3.5538 


No. per Day 

None 

1-9 

10-20 

21-39 

Over 39 

m + bj 

2.7433 

3.1053 

3.3052 

3.4492 

3.6612 


As a check, at each stage the quantities + a t ) and 

+ bj) should agree with the grand total Y.. to within rounding 

errors. 

The expected value in each cell is conveniently computed as 

Zj = (m + a,) + (m + b J )- 7.. 

Table 16.12.4 shows the observed and expected values and the devia- 
tions. The value of x 2 = 2 - f,j ) 2 is 13.2 with 12 d.f., giving no 

indication of a lack of fit. The largest deviation is the deficit —0.373 for 
non-smokers aged 45-54: this deviation also makes the largest contribu- 
tion to x 2 . The patterri of + and' - signs in the deviations has no striking 
features. 

By finding the antilogs of the quantities (b, - b x ), the ratios of the 
smoker to the non-smoker annual probabilities of death as given by this 
model are obtained. These ratios were 1.44, 1.75, 2.03, and 2.50, respec- 
tively, for smokers of 1-9, 10-20, 21-39, and over 39 cigarettes per day. 

An example of the analysis of a proportion in a 2 4 factorial classifica- 
tion with only main effects important is given by Yates (16) using the logit 
scale and observed weights. Dyke and Patterson (17) give the maximum 
likelihood analysis of the same data. These authors define the logit as 
j\n(p/q). 

Data containing a proportion in an R x C table may be regarded as 
an R x C x 2 contingency table, or as a particular case of an R x C x T 
contingency table. The definition and testing of three-factor interactions 
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TABLE 16 12 4 

Observed and Expected Numbers of In p 



Reported Number of Cigarettes Per Day 

Age 


None 

1-9 

10-20 

21-39 

Over 39 

35-44 

Y,j 

0 239 

0 489 

0 688 

0 978 

1 182 


K 

0 206 

0 568 

0 767 

0911 

1 123 



+0 033 

-0 079 

-0 079 

+ 0 067 

+0 059 

45-54 


0 971 

1 829 

i m 

2 187 

2 451 



1 344 

1 706 

1906 

2 050 

2 262 



-0 373 

+0 123 

-0013 

+ 0137 

+0189 

55-64 


2 357 

2 664 

2 918 

3 038 

3 310 



2 350 

2 712 

2 912 

3 056 

3 268 



+ 0 007 

-0 048 

+ 0 006 

-0018 

+0 042 

65-74 


3 183 

3 577 

3 744 

3 900 

4 024 



3 184 

3 546 

3 746 

3 890 

4102 



-0 001 

+ 0 031 

-0 002 

+0010 

-0 078 


m such tables has attracted much attention m recent years* Goodman 
( 18 ) gives a review and some simple computing methods 

EXAMPLE 16 12 l — In each row of table 16 12 1 find the unweighted average of the 
probabilities and divide the individual probabilities by this number Show that the results 
are as follows 


Age 

None 

• 

1-9 

10-20 

21-39 

Over 39 

35-44 

59 

75 

92 

123 

1 51 

45-54 

37 

86 

92 

1 24 

1 61 

55-64 

58 

78 

101 

1 14 

1 49 

65-74 

58 

86 

102 

1 19 

1 35 


The two numbers that seem most out of line are the low value 0 37 for (None 45-54) 
and the low value 1 35 for (over 39, 65-74) 

EXAMPLE 16 12 2 — Suppose that there are three groups of n men with probabilities 
of dying 0 01 0 02 and 0 03 The variance of the total number who die is 

n[{ Oi )( 99) + ( 02)( 98) + ( 03)( 97)] = 0 0586n 

Hence the variance of the proportion of those dying out ot 3 n is 0586 n 9/7= 0 00651 1 n 
For the combined sample the probabihu of d>mg is 0 02 If we wrongly regard the com 
bined sample as a single binomial ol size 3n with p = 0 02 we would compute the variance 
of the proportion dvmg as (0 02)(0 98) 3n — 0 00653 1 n The icfual var ance is just a trifle 
smaller than the binomial variance 

11 there are k groups of men with probabilities p t p p k show that the relation be 
tween the actual and the binomial variance of the overall proportion dying is 

Ka = * / » - %(P - pYlnk 1 

EXAMPLE 16 P 3— In a sample ot size n witn population probability p the true logit 
is \n(p/q) The value Y - ln{(^ + j) (n - g + ])! is a relatively unbiased estimate of 
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In iptq) for expectations np and nq as low as 3 The weight W * (g + J)(w - g 4- \Hn 4- 1 ) 
corresponds to a variance 

V = L a - i. + !__ 

w S + i «~g+i 

The quantity V is an almost unbiased estimate of the population variance of Y in small 
samples As an illustration the values of the binomial probability P and of Y and V are 
shown below for each value of g when n ** 10 p = 0 3 


g 

P 

Y 

V 

Y 2 ' 

0 

0282 

-3 046 

2 095 

9 278 

1 

1211 

-1 846 

0 772 

3 408^ 

2 

2335 

-1 224 

0518 

1 498 

3 

2668 

-0 762 

0 419 

0 581 

4 

2001 

-0 367 

0 376 

0 135 

5 

1029 

0 000 

0 364 

0 000 

6 

0368 

0 367 

0 376 

0 135 

7 

0090 

0 762 

0419 

0 581 

8 

0014 

I 224 

0518 

1 498 

9 

0001 

1 846 

0 772 

3 408 

10 

0000 

3 046 

2 095 

9 278 


The true logit is ln(0 3/0 7) = -0 8473 Verify that (i) the mean value of Y is -0 8497 
(li ) the variance of Y is 0 4968 (m) the mean value of V is 0 5164 about 4° f too large 
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★ CHAPTER SEVENTEEN 


Design and analysis 

of sampling 


17.1— Populations. In the 1908 paper in which he discovered the 
/-test, “Student” opened with the following words: “Any experiment may 
be regarded as forming an individual of a population of experiments which 
might be performed under the same conditions. A series of experiments 
is a sample drawn from this population. 

“Now any series of experiments is only of value in so far as it enables 
us to form a judgment as to the statistical constants of the population to 
which the experiments belong.” 

From the previous chapters in this book, this way of looking at data 
should now be familiar. The data obtained in an experiment are subject 
to variation, so that an estimate made from the data is also subject to varia- 
tion and is, hence, to some degree uncertain. You can visualize, however, 
that if you could repeat the experiment many times, putting all the results 
together, the estimate would ultimately settle down to some unchanging 
value which may be called the true or definitive result of the experiment. 
The purpose of the statistical analysis of an experiment is to reveal what 
the data can tell about this true result. The tests of significance and 
confidence limits which have appeared throughout this book are tools for 
making statements about the population of experiments of which your 
data are a sample. 

In such problems the sample is concrete, but the population may 
appear somewhat hypothetical. It is the population of experiments that 
might be performed, under the same conditions, if you possessed the 
necessary resources, time, and interest. 

In this chapter we turn to situations in which the population is con- 
crete and definite, and the problem is to obtain some desired information 
about it. Examples are as follows : 


Population 
Ears of corn m a field 
Seeds in a large batch 
Water m a reservoir 
Third-grade children in a school 

504 


Information Wanted 
Average moisture content 
Percentage germination 
Concentration of certain bacteria 
Average weight 
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If the population is small, it is sometimes convenient to obtain the 
information by collecting the data for the whole of the population. More 
frequently, time and money can be saved by measuring only a sample 
drawn from the population. When the measurement is destructive, sam- 
pling is of course unavoidable. 

This chapter presents some methods for selecting a sample and for 
estimating population characteristics from the data obtained in the 
sample. During the past thirty years, sampling has come to be relied upon 
by a great variety of agencies, including government bureaus, market 
research organizations, and public opinion polls. Concurrently, much has 
been learned both about the theory and practice of sampling, and a num- 
ber of books devoted to sample survey methods have appeared (2, 3, 4, 5, 
13). In this chapter we- explain the general principles of sampling and 
show how to handle some of the simpler problems that are common in 
biological work. For more complex problems, references will be given. 

17.2 — A simple example. In the early chapters of this book, you drew 
samples so as to examine the amount of variation in results from one 
sample to another and to verify some important results in statistical 
theory. The same method will illustrate modem ideas about the selection 
of samples from given populations. 

Suppose the population consists of N=6 members, denoted by the 
letters a to /. The six values of the quantity that is being measured are as 
follows: a 1 ; b 2; c 4; d 6; e 7;/ 16. The total for this population is 36. 
A sample of three members is to be drawn in order to estimate this total. 

One procedure already familiar to you is to write the letters a to/on 
beans or slips of paper, mix them in some container, and draw out three 
letters. In sample survey work, this method of drawing is called simple 
random sampling, or sometimes random sampling without replacement (be- 
cause we do not put a letter back in the receptacle after it has been drawn). 
Obviously, simple random sampling gives every member an equal chance 
of being in the sample. It may be shown that the method also gives every 
combination of three different letters (e.g., aef or cde) an equal chance of 
constituting the sample. 

How good an estimate of the population total do we obtain by simple 
random sampling? We are not quite ready to answer this question. Al- 
though we know how the sample is to be drawn, we have not yet discussed 
how the population total is to be estimated from the results of the sample. 
Since the sample contains three members and the population contains six 
members, the simplest procedure is to multiply the .sample total by 2, and 
this is the procedure that will be adopted. Y ou should note that any sam- 
pling plan contains two parts — a rule for drawing the sample and a rule for 
making the estimates from the results of the sample. 

We can now write down all possible samples of size 3, make the esti- 
mate from each sample, and see how close these estimates lie to the true 
value of 36. There are 20 different samples. Their results appear in table 
1 7.2. 1 , where the successive columns show the composition of the sample. 
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the sample total, the estimated population total, and the error of estimate 
(estimate minus true value). 

Some samples, e.g., abf and cde , do very well, while others like abc 
give poor estimates. Since we do not know in any individual instance 
whether we will be lucky or unlucky in the choice of a sample, we appraise 
any sampling plan by looking at its average performance. 


TABLE 17.2.1 

Results for All Possible Simple Random Samples of Size Three 


Sample 

Sample 

Total 

Estimate of 
Population 
Total 

Error of 
Estimate 

Sample i 

Sample 

Total 

Estimate of 
Population 
Total 

Error of 
Estimate 

abc 

7 

14 

-22 

.bed \ 

12 

24 

-12 

abd 

9 

18 

-18 

bee 

13 

26 

-10 

abe 

10 

20 

-16 

bef I 

22 

44 

+ 8 

abf 

19 

38 

+• 2 

bde 

15 

30 

- 6 

acd 

11 

22 

-14 

bdf 

24 

48 

+ 12 

ace 

12 

24 

-12 

bef 

25 

50 

+ 14 

acf 

21 

42 

+ 6 

cde 

17 

34 

- 2 

ade 

14 

28 

- 8 

cdf 

26 

52 

+ 16 

adf 

23 

46 

+ 10 

cef 

27 

54 

+ 18 

aef 

24 

48 

+ 12 

def 

29 

58 

+ 22 

: 


Average 

18 

36 

0 


The average of the errors of estimate, taking account of their signs, is 
called the bias of the estimate (or, more generally, of the sampling plan). A 
positive bias implies that the sampling plan gives estimates that are on the 
whole too high; a negative bias, too low. From table 17.2.1 it is evident 
that this plan gives unbiased estimates, since the average of the 20 estimates 
is exactly 36 and consequently the errors of estimate add to zero. With 
simple random sampling this result holds for any population and any 
size of sample. Estimates that are unbiased are a desirable feature of a 
sampling plan. On the other hand, a plan that gives a small bias is not 
ruled out of consideration if it has other attractive features. 

As a measure of the accuracy of the sampling plan we use the mean 
square error of the estimates taken about the true population value. 
This is 

. . c r 2 (Error of estimate ) 2 3,504 

M.S.E. = — = - — = 175 9 

20 20 

The divisor 20 is used instead of the divisor 19, because the errors are mea- 
sured from the true population value. To sum up, this plan gives an esti- 
mate of the population total that is unbiased and has a standard error 
\/ 175.2 = 13.2. This standard error amounts to 37% of the true popula- 
tion total; evidently the plan is not very accurate for this population. 
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In simple random sampling the selection of the sample is left to the 
luck of the draw. No use is made of any knowledge that we possess about 
the members of the population. Given such knowledge, we should be 
able to improve upon simple random sampling by using the knowledge to 
guide us in the selection of the sample. Much of the research on sample 
survey methods has been directed towards taking advantage of available 
information about the population to be sampled. 

By way of illustration, suppose that before planning the sample we 
expect that/will give a much higher value than any other member in the 
population. How can we use this information? It is clear that the esti- 
mate from the sample will depend to a considerable extent on whether / 
falls in the sample or not. This statement can be verified from table 
17.2.1 ; every sample containing f gives an overestimate and every sample 
without / gives an underestimate. 

The best plan is to make sure that / appears in every sample. We 
can do this by dividing the population into two parts or strata. Stratum I, 
which consists of / alone, is completely measured. In stratum II, contain- 
ing a , b, c, d , and c, we take a simple random sample of size 2 in order to 
keep the total sample size equal to 3. 

Some forethought is needed in deciding how to estimate the popula- 
tion total. To use twice the sample total, as was done previously, gives 
too much weight to / and, as already pointed out, will always produce an 
overestimate of the true total. We can handle this problem by treating 
the two strata separately. For stratum I we know the total (16) correctly, 
since we always measure /. For stratum II, where 2 members are mea- 
sured out of 5, the natural procedure is to multiply the sample total In that 
stratum by 5/2, or 2.5. Hence the appropriate estimate of the population 
total is 


16 + 2.5 (Sample total in stratum II) 

These estimates are shown for the 10 possible samples in table 17 . 2 . 2 . 
Again we note that the estimate is unbiased. Its mean square error is 

E (Error of estimate) 2 __ 487.50 __ ^ 

To ~ “ ~10 

The standard error is 7.0 or 19°; of the true total. This is a marked im- 
provement over the standard error of 13.2 that yvas obtained with simple 
random sampling. 

This sampling plan goes by the name of stratified random sampling 
with unequal sampling fractions. The last part of the title denotes the fact 
that stratum I is completely sampled, whereas stratum II is sampled at a 
rate of 2 units out of 5, or 40%. Stratification allows us to divide the 
population into sub-populations or strata that are less variable than the 
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TABLE 17.2.2 

Results for All Possible Stratified Random Samples With the Unequal 
Sampling Fractions Described in Text 


Sample 

Sample Total in 
Stratum II (T 2 ) 

Estimate 

16 + 2.5 7, 

Error of 
Estimate 


a bf 

3 

23.5 

— 12 5 


acf 

5 

28.5 

• - 75 


adf 

7 

33.5 

- 25 


, aef 

8 

36.0 

0.0 


bcf 

6 

31.0 

- 5.0 


bdf 

8 

36.0 

00 


bef 

9 

38.5 

+ 2.5 


cdf 

10 

41.0 

+ 50 


cef 

11 

43.5 

+ 7.5 


def 

13 

48.5 

+ 12.5 


u ' " 

Average 


36.0 

0.0 



original population, and to kmple different parts of the population at dif- 
ferent rates when this seemi advisable. It is discussed more fully in sec- 
tions 17.8 and 17 . 9 . 

EXAMPLE 17.2.1 —In the preceding example, suppose you expect that both e and / 
will give high values. Y ou decide that the sanlple shall consist of e, f, and one member drawn 
at random from a, b , c, d. Show how to obtain an unbiased estimate of the population total 
and show that the standard error of this estimate is 7.7. (This sampling plan is not as ac- 
curate as the plan m which/alone was placed in a separate stratum, because the actual value 
for e is not very high.) 

EXAMPLE 17.2.2— If previous information suggests that / will be high, d and e 
moderate, and a , b, and c small, we might try stratified sampling with three strata. The 
sample consists of/, either d or e, and one chosen from a , b , and c. Work out the unbiased 
estimate of the population total for each of the six possible samples and show that its stan- 
dard error is 3 9. 

17.3 — Probability sampling. The preceding examples were intended 
to introduce you to probability sampling. This is a general name given 
to sampling plans in which 

(i) every member of the population has a known probability of being 
included m the sample, 

(ii) the sample is drawn by some method of random selection con- 
sistent with these probabilities, 

(iii) we take account of these probabilities of selection in making the 
estimates from the sample. 

Note that the probability of selection need not be equal for all mem- 
bers of the population : it is sufficient that these probabilities be known. In 
the first example in the previous section, each member of the population 
had an equal chance of being in the sample, and each member of the sample 
received an equal weight in estimating the population total. But in the 
second example, member /was given a probability 1 of appearing m the 
sample, as against 2/5 for the rest of the population. This inequality in 
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the probabilities of selection was compensated for by assigning a weight 
5/2 to these other members when making the estimate. The use of un- 
equal probabilities produces a substantial gain in precision for some types 
of populations (see section 17.9). 

Probability sampling has several advantages. By probability theory 
it is possible to study the biases and the standard errors of the estimates 
from different sampling plans. In this way much has been learned about 
the scope, advantages, and limitations of each plan. This information 
helps greatly in selecting a suitable plan for a particular sampling.job. As 
will be seen later, most probability sampling plans also enable the stan- 
dard error of the estimate, and confidence limits for the true population 
value, to be computed from the results of the sample. Thus, when a 
probability sample has been taken, we have some idea as to how accurate 
the estimates are. 

Probability sampling is by no means the only way of selecting a sam- 
ple. An alternative method is to ask someone who has studied the popu- 
lation to point out “average” or “typical” members, and then confine the 
sample to these members. When the population is highly variable and 
the sample is small, this method often gives more accurate estimates than 
probability sampling. Another method is to restrict the sampling to those 
members that are conveniently accessible. If bales of goods are stacked 
tightly in a warehouse, it is difficult to get at the inside bales of the pile 
and one is tempted to confine attention to the outside bales. In many 
biological problems it is hard to see how a workable probability sample 
can be devised, as in estimating, for instance, the number of house flies 
in a town, or of field mice in a wood, or of plankton in the ocean. 

One drawback of these alternative methods is that when the sample 
has been obtained, there is no way of knowing how accurate the estimate is. 
Members of the population picked out as typical by an expert may be 
more or less atypical. Outside bales may or may not be similar to interior 
bales. Probability sampling formulas for the standard error of the esti- 
mate or for confidence limits do not apply to these methods. Conse- 
quently, it is wise to use probability sampling unless there is a clear case 
that this is not feasible or is prohibitively expensive. 

17.4 — Listing the population. In order to apply probability sampling, 
we must have some way of subdividing the population into units, called 
sampling units, which form the basis for the selection of the sample. The 
sampling units must be distinct and non-overlapping, and they must to- 
gether constitute the whole of the population. Further, in order to make 
some kind of random selection of sampling units, we must be able to 
number or list all the units. As will be seen, we need not always write 
down the complete list but we must be in a position to construct it. Listing 
is easily accomplished when the population consists of 5,000 cards neatly 
arranged in a file, or 300 ears of corn lying on a bench, or the trees in a 
small orchard. But the subdivision of a population into sampling units 
that can be listed sometimes presents a difficult practical problem. 
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Although we have spoken of the population as being concrete and 
definite, there may be some vagueness about the population which does 
not become apparent until a sampling is being planned. Before we can 
come to grips with a population of farms or of nursing homes, we must 
define a farm or a nursing home. The definition may require much study 
and the final decision may have to be partly arbitrary. Two principles to 
keep in mind are that the definition should be appmrriate to the purpose 
of the sampling and that it should be usable in the nerd (he., the person 
collecting the information should be able to tell what is in and what is out 
of the population as defined). 

Sometimes the available listings of farms, creameries, or nursing 
homes are deficient. The list may be out of date, having some members 
that no longer belong to our population and omitting some that do belong. 
The list may be based on a definition different from that which we wish to 
use for our population. These points should be carefully checked before 
using any list. It often pays to spend considerable effort in revising a list to 
make it complete and satisfactory, since this may be more economical than 
constructing a new list. Where a list covers only part of the population, 
one procedure is to sample this part by means of the list, and to construct 
a separate method of sampling for the unlisted part of the population. 
Stratified sampling is useful in this situation: all listed members are as- 
signed to one stratum and unlisted members to another. 

Preparing a list where none is available may require ingenuity and 
hard work. To cite an easy example, suppose that we wish to take a num- 
ber of crop samples, each 2 ft. x 2 ft., from a plot 200 ft. x 100 ft. Divide 
the length of the plot into 100 sections, each 2 ft., and the breadth into 
50 sections, each 2 ft. We thus set up a coordinate system that divides 

k the whole plot into 100 x 50 or 5,000 quadrats, each 2 ft. x 2 ft. To select 
a quadrat by simple random sampling, we draw a random number be- 
tween 1 and 100 and another random number between 1 and 50. These 
coordinates locate the corner of the quadrat that is farthest from the origin 
of our system. Howev er, the problem becomes harder if the plot measures 
163 ft. x 100 ft., and much harder if we have an irregularly shaped field. 
Further, if we have to select a number ol areas each 6 in. x 6 in. from a 
large field, giving every area an equal chance of selection, the time spent 
in selecting and locating the sample areas becomes substantial. Partly for 
this reason, methods of systematic sampling (section 17.7) have come to 
be favored m routine soil sampling (8). 

Another illustration is a method for sampling (for botanical or chemi- 
cal analysis) the produce of a small plot that is already cut and bulked. 
The bulk is separated into two parts and a coin is tossed (or a random 
number drawn) to decide which part shall contain the sample. This part 
is then separated into two, and the process continues until a sample of 
about the desired size is obtained. At any stage it is good practice to make 
the two parts as alike as possible, provided this is done before the coin is 
tossed. A quicker method, of course, is to grab a handful of about the 
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desired size; this is sometimes satisfactory but sometimes proves to be 
biased. 

In urban sampling in the United States, the city block is often used as 
a sampling unit, a listing of the blocks being made from a map of the town. 
For extensive rural sampling, county maps have been divided into areas 
with boundaries that can be identified in the field and certain of these 
areas are selected to constitute the sample. The name area sampling' has 
come to be associated with these and other methods in which the sampling 
unit is an area of land. Frequently the principal advantage of area sam- 
pling, although not the only one, is that it solves the problem of providing 
a listing of the population by sampling units. 

In many sampling problems there is more than one type or size of 
sampling unit into which the population can be divided. For instance, in 
soil sampling in which borings are taken, the size and shape of the borer 
car be chosen by the sampler. The same is true of the frame used to mark 
out the area of land that is cut in crop sampling. In a dental survey of the 
fifth-grade school children in a city, we might regard the child as the 
sampling unit and select a sample of children from the combined school 
registers for the city. It would be administratively simpler, however, to 
take the school as the sampling unit, drawing a sample of schools and 
examining every fifth-grade child in the selected schools. This approach, 
in which the sampling unit consists of some natural group (the school) 
formed from the smaller units in which we are interested (the children), 
goes by the name of cluster sampling. 

If you are faced with a choice between different sampling units, the 
guiding rule is to try to select the one that returns the greatest precision 
for the available resources. For a fixed size of sample (e.g., 5% of the 
population), a large sampling unit usually gives less accurate results than 
a small unit, although there are exceptions. To counterbalance this, it is 
generally cheaper and easier to take a 5% sample with a large sampling 
unit than with a small one. A thorough comparison between two units is 
likely to require a special investigation, in which both sampling errors and 
costs (or times required) are computed for each unit. 

17.5 — Simple random sampling. In this and later sections, some of 
the best-known methods for selecting a probability sample will be pre- 
sented. The goal is to use a sampling plan that gives the highest precision 
for the resources to be expended, or, equivalently, that attains a desired 
degree of precision with the minimum expenditure of resources, it is 
worthwhile to become familiar with the principal plans, since they are 
designed to take advantage of any information that you -have about the 
structure of the population and about the costs of taking the sample. 

In section 17.2 you have alr&dy been introduced to simple random 
sampling . This is a method in which the members of the sample are drawn 
independently with equal probabilities. In order to illustrate the use of a 
table of random numbers for drawing a random sample, suppose that the 
population contains N = 372 members and that a sample of size n = 10 
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is wanted. Select a three-digit starting number from table A 1, say the 
number is 539 in row 1 1 of columns 80-82. Read down the column and 
pickout the first ten three-digit numbers that do not exceed 372. These are 
334, 365, 222, 345, 245, 272, Of 5, 038, 127, and 112. The sample consists of 
the sampling units that carry these numbers in your listing of the popula- 
tion. If any number appears more than | once, ignore it on subsequent 
appearances and proceed until ten different numbers have been found. 

If the first digit in N is 1 , 2, or 3, this method requires you to skip many 
numbers in the table because they are too large. (In the above example 
we had to cover 27 numbers in order to find ten for the sample.) This does 
not matter if there are plenty of random numbers. An alternative is to use 
all three-digit numbers up to 2 x-372 = 744. Starting at the same place, 
the first ten numbers that do not exceed 744 are 539, 334, 615, 736, 365, 
222, 345, 660, 431, and 427. Now subtract 372 from all numbers larger 
than 372. This gives, for the sample, 167, 334, '243, 364, 365, 222, 345, 
288, 59, and 55. With N = 189, for instance, we can use all numbers up to 
5 x 189 = 945 by this device, subtracting 189 or 378 or 567 or 756 as the 
case may be. 

As mentioned previously, simple random sampling leaves the selec- 
tion of the sample entirely to chance. It is often a satisfactory method 
whe^ ne population is not highly variable and, in particular, when esti- 
mating proportions that are likely to lie between 20% and 80%. On the 
other hand, if you have any knowledge of the variability in the population, 
such as that certain segments of it are likely to give higher responses than 
others, one of the methods to be described later may b€ more precise. 

If Y t (z = 1, 2, . . . N) denotes the variable that is being studied, the 
standard deviation, <r, of the population is defined as 


<7 



s(r, - Y) 2 

N - 1 


where Y is the population mean of the Yj and the sum Z is taken over all 
sampling units in the population. 

Since Y denotes the population mean, we shall use y to denote the 
sample mean. In a simple random sample of size n , the standard error 
of y is : 




where <j) = n/N is the sampling fraction, i.e., the fraction of the population 
that is included in the sample. The sampling fraction is commonly de- 
noted by the symbol/, but <f) is used here to avoid confusion with our pre- 
vious use of / for degrees of freedom.) 

The term a yjn is already familiar to you : this is the usual formula for 
the standard error of a sample mean. The second factor, ^/(l — <j>\ is 



513 


known as the finite population correction. It enters because we are sam- 
pling from a population of finite size, N, instead of from an infinite popula- 
tion as is assumed in the usual theory. Note that this term makes the stan- 
dard error zero when n = N, as it should do, since we have then measured 
every unit in the population. In practical applications the finite popula- 
tion correction is close to 1 and can be omitted when n/N is less than 10%, 
i.e., when the sample includes less than 10% of the population. 

.This result is remarkable. In a large population with a fixed amount 
of variability (a given value of cr), the standard error of the mean depends 
mainly on the size of sample and only to a minor extent on the fraction of 
the population that is sampled. For given cr, the mean of a sample of 100 is 
almost as precise when the population size is 200,000 as when the popula- 
tion size is 20,000 or 2,000. Intuitively, some people feel that one cannot 
possibly get accurate results from a sample of 100 out of a population of 
200,000, because only a tiny fraction of the population has been measured. 
Actually, whether the sampling plan is accurate or not depends primarily 
on the size of cr/yjn. This shows why sampling can bring about a great 
reduction in the amount of measurement needed. 

For the estimated standard error of the sample mean we have 

s ^= ^nA 1 - M 


where s is the standard deviation of the sample, calculated in the usual way. 

If the sample is used to estimate the population total of the variable 
under study, the estimate is Ny and its estimated standard error is 


— 


Ns 


Vd - <t>) 


In simple random sampling for attributes, where every member of the 
sample is classified into one of two clashes, we take 



where p is the proportion of the sample that lies in one of the classes. Sup- 
pose that 50 families are picked at random from a list of 432 families who 
possess telephones and that 10 of the families report that they are listening 
to a certain radio program. Then p = 0.2, q = 0.8 and 


KO2)(0-8) 

50 


= 0.053 


If we ignore the finite population correction, we find a p = 0.057. 

The formula for s p holds only if each sampling unit is classified as a 
whole into one of the two classes. If you are using cluster sampling and are 
classifying individual elements within each cluster, a different formula for 
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s p must be used. For instance, in estimating the percentage of diseased 
plants in a field from a sample of 360 plants, the foimula above holds iff 
the plants were selected independently and at random. To save time in 1 
the field, however, we might have chosen 40 areas, each consisting of 3 
plants in each of 3 neighboring rows. With this method the area (a clus- 
ter of 9 plants) is the sampling unit. If the distribution of disease in the 
field were extremely patchy, it might happen that every area had either 
all 9 plants diseased or no plants diseased. In this event the sample of 40 
areas would be no more precise than a sample of 40 independently chosen 
plants, and we would be deceiving ourselves badly if we thought that we 
had a binomial sample of 360 plants. 

The correct procedure for computing s p is simple. Calculate p sepa- 
rately for each area (or sampling unit) and apply to these p's the previous 
formula for continuous variates. That is, if p, is the percentage diseased 
in the ith area, the sample standard deviation is 


where n is now the number of areas (cluster units). Then 

s p = ~ vA 1 ~ 0) 

V” 

For instance, suppose that the numbers of diseased plants in the 40 areas 
were as given in table 17.5.1. 

TABLE 17.5 1 

Numbers of Diseased Plants (out of 9) in Each of 40 Areas 

2 5 1 1 1700323000 7 04126 

0014501426* O' 241735036 

Grand total — 99 



The standard deviation of the numbers of diseased plants in this sample is 
2.331. Since thd proportions of diseased plants in the 40 areas are found by 
dividing the numbers in table 17 5.1 by 9, the standard deviation of the 
proportions is 


^ - 0.259 


Hence (assuming V large). 



0 259 
V40 


= 0 041 
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For comparison, the result given by the binomial formula will be 
. worked out. From the total in table 17.5.1, p = 99/360 = 0.275. The 
binomial formula is 


s 


P 



, 0.024, 

v 360 


giving an overly optimistic notion of the precision of p. 

Frequently, the clusters are not all of the same size. This happens 
when the sampling units are areas of land that contain different numbers 
of the plants that are being classified. Let m l be the number of elements 
that are classified in the fth unit, and a ( the number that fall into a speci- 
fied class, so that p t = ajm v Then p , the overall proportion in the sam- 
ple is ( 'La l )/(Lm l ), where each sum is taken over the n cluster units. 

The formula for s, the standard deviation of the individual propor- 
tions p j uses a weighted mean square of the deviations ( p { — p\ as follows: 


s = 


1 

(*- 1 ) 


I 



(p. - p? 


where m = Sm,/« is the average size of cluster irr the sample. This formula 
is an approximation, no correct expression for s being kno\yn in usable 
form. As before, we have 


For computing purposes, s is better expressed as 


5 “ ^ a ' 2 ~ 2pZa ‘ mt +p 2 z™i 2 } 

The sums of squares £a, 2 , Xw, 2 and the sum of products 'La i m l are cal- 
culated without the usual corrections for the mean. The same value of 5 
is obtained whether the corrections for the mean are applied or not, but 
it saves time not to apply them. 

EXAMPLE 17 5.1 — If a sample of 4 from the 16 townships of a count' has a standard 
deviation 45, show that the standard error of the mean is 19 5 

EXAMPLE 17.5 2— In the example presented in section 17.2 we had N = 6, n « 3, 
and the values for the 6 members of the population were 1, 2, 4, 6, 7, and 16 The formula 
for the true standard error of the estimated population total is 



Verily that this formula agrees with the result, 13.2, which we found by writing down all 
possible samples 
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EXAMPLE 17 5 3 — A simple random sample of size 100 is taken-in order to estimate 
some proportion (e.g., the proportion of males) whose value m the population is close to 1/2. 
Work out the standard error of the sample proportion p when the size of the population is 
(i) 200, (ii) 500, (m) 1,000, (iv) 10,000, (v) 100,000. Note how little the standard error changes 
for N greater than 1,000. 

EXAMPLE 17 $ 4- Show that the coefficient of variation of the sample mean is the 
same as that of the estimated population total 

EXAMPLE 17 5 5 — In simple random sampling for attributes, show that the standard 
error of p, for given N and n, is greatest when p is 50%, but that the coefficient of vanation of 
p is largest when p is very small 

17.6 — Size of sample. At an early stage in the design of a sample, the 
question ‘"How large a sample do I need?” must be considered. Although 
a precise answer may not be easy to find, for reasons that will appear, 
there is a rational method of attack on the problem. 

Clearly, we want to avoid making the sample so small that the esti- 
mate is too inaccurate to be useful. Equally, we want to avoid taking a 
sample that is too large, in that the estimate is more accurate than we re- 
quire. Consequently, the first step is to decide how large an error we 
can tolerate in the estimate. This demands careful thinking about the 
use to be made of the estimate and about the consequences of a sizeable 
error. The figure finally reached may be to some extent arbitrary, yet 
after some thought samplers often find themselves less hesitant about 
naming a figure than they expected to be. 

The next step is to express the allowable error in terms of confidence 
limits. Suppose that L is the allowable error in the sample mean, and 
that we are willing to take a 5% chance that the error will exceed L. In 
other words, we want to be reasonably certain that the error will not ex- 
ceed L. Remembering that the 95% confidence limits computed from a 
sample mean, assumed approximately normally distributed, are 


2(7 



where we have ignored the finite population correction, we put 



This gives, for the required sample size, 


4v 2 



In order to use this relation, we must have an estimate of the popula- 
tion standard deviation, o. Often a good guess can be made from the 
results of previous samplings of this population or of other similar popula- 
tions For example, an experimental sample was taken in 1938 to estimate 
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the yield per acre of wheat in certain districts of North Dakota (7). For a 
sample of 222 fields, the variance of the yield per acre from field to field 
was s 2 = 90.3 (in bushels 2 ). How many fields are indicated if v^e wish to 
estimate the true mean yield within + 1 bushel, with a 5% risk that the 
error will exceed 1 bushel? Then 

4<r 2 4(90.3) 

n = — = — — - = 361 fields 

If this estimate were being used to plan a sample in some later year, it 
would be regarded as tentative, since the variance between fields might 
change from year to year. 

In default of previous estimates, Deming (3) has pointed out that er 
can be estimated from a knowledge of the highest and lowest values in the 
population and a rough idea of the shape of the distribution. Ifh = (high- 
est - lowest) ; then a = 0.29 h for a uniform (rectangular) distribution, 
a = 0.24/? for a symmetrical distribution shaped like an isosceles triangLe, 
and a = 0.21 h for a skew distribution shaped like a right triangle. 

If the quantity to be estimated is a binomial proportion, the allowable 
error, L, for 95% confidence probability is 

L = 2 

The sample size required to attain a given limit of error, L, is therefore 



In this formula, p, q, and L may be expressed either as proportions or as 
percentages, provided they are all expressed in the same units. The 
result necessitates an advance estimate of p. If p is likely to lie between 
35% and 65%, the advance estimate can be quite rough, since the product 
pq varies little for p lying between these limits. If, however, p is near zero 
or 100%, accurate determination of n requires a close guess about the 
value of p. 

We have ignored the finite population correction in the formulas pre- 
sented in this section. This is satisfactory for the majority of applications. 
If the computed value of n is found to be more than 1 0% of the population 
size, N, a revised value n' which takes proper account of the correction 
is obtained from the relation 

, n 
" = 1 +</> 

F or example, casual inspection of a batch of 480 seedlings indicates that 
about 15% are diseased. Suppose we wish to know the size of sample 
needed to determine p, the per cent diseased, to within ±5%, apart from 
a I-m-20 chance. Formula 17.6.1 gives 
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^ 4(15)(85) 
(25) 


= 204 seedlings 


At this point we might decide that ifc- would be as quick to classify every 
seedling as to plan a sample that is a substantial part of the whole batch. 
If we d .ride on sampling, we make a revised estimate, n\ as 


n = 


n 

i + 4 > 


204 

, 204 

1 4 " 

480 


143 


The formulas presented in this section are appropriate for simple ran- 
dom sampling. If sorfte other sampling method is to be used, the general 
principles for the determination of n remain the same, but the formula for 
the confidence limits, and hence the formula connecting L with n, will 
change. Formulas applicable to more complex methods of sampling can 
be obtained in books devoted to the subject, e.g., (2, 4). In practice, the 
formulas in this section are frequently used to provide a preliminary 
notion of the value of n, even if simple random sampling is not intended 
to be used. The values of n are revised later if the proposed method of 
sampling is markedly different in precision from simple random sampling. 

When more than one variable is to be studied, the value of n is first 
estimated separately for each of the most important variables. If these 
values do not differ by much, it may be feasible to use the largest of the 
n's. If the n'k L:ui' greatly, one method is to use the largest «, but to 
measure certain items on only a sub-sample of the original sample, e.g., on 
200 sampling units out of 1,000. In other situations, great disparity in 
the n s is an indication that the investigation must be split into two or more 
separate surveys. 


EXAMPLE 17 6 I — A simple random sample of houses is to be taken to estimate the 
percentage of houses that are unoccupied The estimate is desired to be correct to within 
± l J (J , witti 95 n confidence One advance estimate is that the percentage of unoccupied 
houses will be about 6° 0 , another is that it will be about 4° 0 What sizes of sample are re- 
quired on these two forecasts 1 What size would >ou recommend 9 

bX \MPLE 17 6 2 The total number ot rats in the residential part of a large city is to 
be estimated with an error ot not more than 20° o , apart from a 1 -m-20 chance In a previous 
survey, the mean number ol rats per city block was nine and .he sample standard deviation 
was 19 (the distribution is extremelv skew ) Show that a simple random sample of around 
450 blocks should suffice 

I \ WlPLt 17 6A West (12) quotes *he following data for 556 full-time farms m 
Sen.c i C ounU New \ ork 



Mean 

Standard Deviation Per Farm 

Acres in cor r 

8 S 

90 

Acres in sin i i _j nils 

42 0 

^9 5 

Acres n 'ui\ 

9 

26 9 
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If a coefficient of variation of up to 5% can be tolerated, show that a random sample 
of about 240 farms is required to estimate the total acreage of each crop in the 556 farms with 
this degree of precision (Note that the finite population correction nmi be used ) This 
example illustrates a result that has been reached by several different investigators , wit h small 
farm populations such as counties, a substantial part of the whole population must be 
sampled in order to obtain accurate estimates 

17.7 — Systematic sampling. In order to draw a 10% sample from a 
list of 730 cards, we .might select a random number between *1 and 10, say 
3, and pick every 10th card thereafter; i.e., the cards numbered 3, 13,23, 
and so on, ending with the card numbered 723. A sample of this kind 
is known as a systematic sample , since the choice of its first member, 3, 
determines the whole sample. 

Systematic sampling has two advantages over simple random sam- 
pling. It is easier to draw, since only one random number is required, and 
it distributes the sample more evenly over the listed population. For this 
reason systematic sampling often gives more accurate results than simple 
random sampling. Sometimes the increase in accuracy is large. In 
routine sampling, systematic selection has become a popular technique. 

There are two potential disadvantages. If the population contains 
a periodic type of variation, and if the interval between successive units 
in the systematic sample happens to coincide with the wave length (or 
a multiple of it) we may obtain a sample that is badly biased. To cite 
extreme instances, a systematic sample of the houses in a city might con- 
tain far too many, or too few, corner houses; a systematic sample from a 
book of names might contain too many, or too few, names listed first on a 
page, who might be predominantly males, or heads of households, or 
persons of importance. A systematic sample of the plants in a field might 
have the selected plants at the same positions along every row. These 
situations can be avoided by being on the lookout for them and either 
using some other method of sampling or selecting a new random number 
frequently. In field sampling, we could select a new random number in 
each row. Consequently, it is well to know something about the nature 
of the variability in the population before deciding to use systematic 
sampling. 

The second disadvantage is that from the resuhs of a systematic sam- 
ple there is no reliable method of estimating the standard error of the sam- 
ple mean. Textbooks on sampling give various formulas for that may be 
tried: each formula is valid for a certain type of population, but a formula 
can be used with confidence only if we have evidence that the population 
is of the type to which the formula applies. However, systematic sampling 
often is a part of a more complex sampling plan m which it is possible to 
obtain unbiased estimates of the sampling errors. 

EXAMPLE 17 7 1 -The purpose of this example is to compare simple random sam- 
pling and systematic sampling of a small population The following data are the weights of 
maize (in 10-gm. units) for 40 successive hills lying in a single row 104, 38, 105, 86, 6T 32 
47,0,80,42,37,48,85,66, 110,0,73,65, 101,47,0, 36, 16,33,22,32, 33, 0, 35, 82, 37, 45, M) 
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76, 45, 70, 70, 63, 83, 34. To save you time, the population standard deviation is given as 30 1 
Compute the standard deviation of the mean of a simple random sample of 4 hills A sys- 
tematic sample of 4 hills can be taken by choosing a random number between 1 and 10 and 
taking every 10th hill thereafter. Find the mean y sy for each of the 10 possible systematic 
samples and compute the standard deviation of these means about the true mean ^ of the 
population. Note that the formula for the standard deviation is 


<*(%>) = 



Verify that the standard deviation of the estimate is about 8% lower with systematic sam- 
pling. To what do you think this difference is due 9 


17,8 — Stratified sampling. There are three steps m stratified sam- 
pling: 

(1) The population is divided into a number of parts, called strata. 

(2) A sample is drawn independently in each part. 

(3) As an estimate of the population mean, we use 


- _ *N h y h 
y *‘ N ’ 

where N h is the total number of sampling units in the hth stratum, y h is 
the sample mean in the Ath stratum and N = is the size of the popu- 
lation. Note that we must know the values of the N h (i.e., the sizes of the 
strata) in order to compute this estimate. 

Stratification is commonly employed in sampling plans for several 
reasons. It can be shown that differences between the strata means in 
the population do not contribute to the sampling error of the estimate 
y st . In other words, the sampling error of y st arises solely from variations 
among sampling units that are in the same stratum If we can form strata 
so that a heterogeneous population is divided into parts each of which is 
fairly homogeneous, we may expect a gain in precision over simple random 
sampling. In taking 24 soil or crop samples from a rectangular field, we 
might divide the field into 12 compact plots, and draw 2 samples at random 
from each plot Since a small piece of land is usually more homogeneous 
than a large piece, this stratification will probably bring about an increase 
m precision, although experience indicates that in this application the 
increase will be modest rather than spectacular. To estimate total wheat 
acreage from a sample of farms, we might stratify by size of farm, using 
any information available for this purpose. In this type of application the 
gam m precision is frequently large. 

In stratified sampling, we can choose the size of sample that is to be 
taken from any stratum This freedom of choice gives us scope to do an 
efficient job of allocating resources to the sampling within strata. In some 
applications, this is the principal reason for the gam m precision from 
stratification Further, when different parts of the population present 
different problems of listing and sampling, stratification enables these 
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problems to be handled separately. For this reason, hotels and large 
apartment houses are frequently placed in a separate stratum in a sample 
of the inhabitants of a city. 

We now consider the estimate from stratified sampling and its stan- 
dard error. For the population mean, the estimate given previously may 
be written 

y* = jf zm* = zwju. 

wher£ W h = NJN is the relative weight attached to the stratum. Note 
that the sample means, y h , in the respective strata are weighted by the 
sizes, N h% of the strata. The arithmetic mean of the sample observations 
is no longer the estimate except in one important special case. This occurs 
with proportional allocation , when we sample the same fraction from every 
stratum. With proportional allocation, 


n i n 2 n h n 

It follows that 


■r _ N h _ n h 

" 1 V ft 


Hence, 


3V = ZW h y k = 


Zn h y h 


y, 


since 1,n h y h is the total of all observations in the sample. With propor- 
tional allocation, we are saved the trouble of computing a weighted mean : 
the sample is self-weighting. 

In order to avoid two levels of subscripts, we use the symbol s(y a ) to 
denote the estimated standard error of y„. Its value is 


s(y.) = < 

V n h 

where s h 2 is the sample variance in the hi h stratum, i.e., 

2 nY hl -y h f 

Sh ~ n h - 1 ’ 


where Y hl is the zth member of the sample from the hth stratum. This 
formula for the standard error of y st assumes that simple random sampling 
is used within each stratum and does not include the finite population 
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correction. If the sampling fractions 4> k exceed 10% in some of the strata, 
we use the more general formula 

I W 2 s 2 

s(jy= / E-4-Ml-M (17.8.1) 

V n h 

With proportional allocation the sampling fractions <fi h are all equal and 
the general formula simplifies to 


IzviW , 

s(y») = ^ _ 4 >) 

If, further, the population variances are the same in all strata (a reason- 
able assumption in some applications), we obtain an additional simplifica- 
tion to 

■s(% t ) = -^ V(i - <t> ) 

This result is the same as that for the standard error of the mean with 
simple random sampling, except that s w > the pooled standard deviation 
within strata , appears in place of the sample standard deviation, s. In 
practice, s w is computed from an analysis of variance of the data. 

As an example of proportional allocation, the data in table 17.8.1 
come from an early investigation by Clapham (1) of the feasibility of 
sampling for estimating the yields of small cereal plots. A rectangular plot 
of wheat was divided transversely into three equal strata. Ten samples, 
each a meter length of a single row, were chosen by simple random sam- 
pling from each stratum. The problem is to compute the standard error 
of the estimated mean yield per meter of row. 


TABLE 17.8.1 

Analysis of Variance of a Stratified Random Sample 
(Wheat grain yields — gm. per meter) 


Source of Variation 

Degrees of Freedom 

Sum of Squares 

Mean Square 

Total 

29 

8,564 

295.3 

Between strata | 

2 

2,073 

1,036.5 

Within strata 

27 

6,491 

240.4 


In this example, s w = ^240.4 = 15.5, and n = 30. Since the sample 
is only a negligible part of the whole plot, n/N is negligible and 



15.5 

730 


= 2.83 gm. 
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How effective was the stratification? From the analysis of variance 
it is seen that the mean square between strata is over four times as large 
as that within strata. This is an indication of real differences in level of 
yield from stratum to stratum. It is possible to go further, and estimate 
what the standard error of the mean would have been if simple random 
sampling had bden used without any stratification. With simple random 
sampling, the corresponding formula for the standard error of the mean is 



wfiere s is the ordinary sample standard deviation. In the sample under 
discussion, s is N /295.3 (from the total mean square in table 17.8.1). Hence, 
as an estimate of the standard error of the mean under simple random 
sampling, we might take 


J295.3 
V'30 ~ 


3.14 gm., 


as compared with 2.83 gm. for stratified random sampling. Stratification 
has reduced the standard error by about 10%. 

This comparison is not quite correct, for the rather subtle reason that 
the value of s was calculated from the results of a stratified sample and not, 
as it should have been, from the results of a simple random sample. Valid 
methods of making the comparison are described for all types of stratified 
sampling in (2). The approximate method which we used is close enough 
when the stratification is proportional and at least ten sampling units are 
drawn from every stratum. 


EXAMPLE 17.8.1 — In the example of stratified sampling given m section 17.2, show 
that the estimate which we used for the population total was Ny # . From formula 1 7.8. 1 for 
the standard error of verify that the variance of the estimated population total is 48.75, as 
found directly in section 17.2. (Note that stratum I makes no contribution to this variance 
because n h — N h in that stratum.) 

17.9 — Choice of sample sizes in the individual strata. It is some- 
times thought that in stratified sampling we should sample the same frac- 
tion from every stratum; i.e., we should make n h /N h the same in all strata, 
using proportional allocation. A more thorough analysis of the problem 
shows, however, that the optimum allocation is to take n h proportional to 
N h cTh/yJ c h, where o h is the standard deviation of the sampling units in the 
hth stratum, and c h is the cost of sampling per unit in the Ath stratum. This 
method of allocation gives the smallest standard error of the estimated 
mean y st for a given total cost of taking the sample. The rule tells us to 
take a larger sample, as compared with proportional allocation, in a 
stratum that is unusually variable (o h large), and a smaller sample in a 
stratum where sampling is unusually expensive ( c h large). Looked at in 



524 Chapter 17: Design and Analysis of Sampling 

this way, the rule is consistent with common sense, as statistical rules 
always are if we think about them carefully. The rule reduces to pro- 
portional allocation when the standard deviation and the cost per unit 
are the same in all strata. 

In order to apply the rule, advance estimates are needed both of the 
relative standard deviations and of the relative costs in different strata. 
These estimates need not be highly accurate; rough estimates often give 
results satisfactorily near to the optimum allocation. When a population 
is sampled repeatedly, the estimates can be obtained from the results of 
previous samplings. Even when a population is sampled for the first 
time, it is sometimes obvious that some strata are more accessible to 
sampling than others. In this event it pays to hazard a guess about the 
differences in costs. In other situations we are unable to predict with any 
confidence which strata will be more variable pr more costly, or we think 
that any such differences will be small/ Proportional allocation is then 
used. 

There is one common situation in which disproportionate sampling 
pays large dividends. This occurs when the principal variable th£t is 
being measured has a highly skewed or asymmetrical distribution. Usual- 
ly, such populations contain a few sampling units that have large values 
for this variable and many units that have small values. Variables that 
are related to the sizes of economic institutions are often of this type, for 
instance, the total sales of grocery stores, the number of patients per hos- 
pital, the amounts of butter produced by creameries, family incomes, and 
prices of houses. 

With populations of this type, stratification by size of institution is 
highly effective, and the optimum allocation is likely to be much better 
than proportional allocation. As an illustration, table 17.9.1 shows 
data for the number of students per institution in a population consisting 
of the 1,019 senior colleges and universities in the United States. The 
data, which apply mostly to the 1952-1953 academic year, might be used 


TABLE 17.9.1 

Data for Total Registrations Per Senior College or Uni\ersity, 
Arranged in Four Strata 


Stratum : 
Number of 
Students 

Per Institution 

Number 

of 

Institutions 
! * 

Total 

Registration 
for the 
Stratum 

Mean 

Per 

Institution 

z 

Standard 
Deviation Pei 
Institution 

<?h 

Less than 1,000 

661 

292,671 

443 

236 



345,302 

1,684 

625 

3,000 10,000 

122 

672,728 

5,514 

2,008 

Oicr 10.000 

31 

573,693 

18,506 

10,023 

Tu ta 1 i 

1 

■M 

1,884,394 
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as background information for planning a sample designed to give a quick 
estimate of total registration in some future year. The institutions are 
arranged in four strata according to size. 

Note that the 31 largest universities, about 3% in number, have 30% 
of the students, while the smallest group, which contains 65% of the in- 
stitutions, contributes only 15% of the students. Note also that the 
within-stratum standard deviation, a h , increases rapidly with increasing 
size of institution. 

Table 17.9.2 shows the calculations needed for choosing the optimum 
sample' sizes within strata. We are assuming equal costs per unit within 
all strata.' The products, N h a h , are formed and added over all strata. 
Then the relative sample sizes, N h a h fLN h a h , are computed. These ratios, 
when multiplied by the intended sample size rt, give the sample sizes in 
the individual strata. 


TABLE 17.9.2 

Calculations for Obtaining the Optimum Sample Sizes in Individual Strata 


Stratum : 

, Number of 
Students 

Number of 
Institutions 

’ N„o h 

Relative 
Sample Sizes 

Actual 

Sample 

Sizes 

Sampling 

Rate 

(°Q 

Less than 1,000 

661 

155,996 

.1857 

65 

10 

1,000-3,000 

205 

128,125 

.1526 

53 

26 

3,000-10,000 

122 

244,976 

2917 

101 

83 

Over 10,000 

31 

310,713 

3700 

31 

100 

Total 

1,019 

839,810 

1.0000 

250 



As a consequence of the large standard deviation in the stratum with 
the largest universities, the rule requires 37% of the sample to be taken 
from this stratum. Suppose we are aiming at a total sample size of 250. 
The rule then calls for (0.37)(250) or 92 universities from this stratum 
although the stratum contains only 31 universities in all. With highly 
skewed populations, as here, the optimum allocation may demand 100% 
sampling, or even more than this, of the largest institutions. When this 
situation occurs, the best procedure is to take 1 00% of the “large” stratum, 
and employ the rule to distribute the remainder of the sample over the 
other strata. Following this procedure, we include in the sample all 31 
largest institutions, leaving 219 to be distributed among the first three 
strata. In the first stratum, the size of sample is 


219 


0.1857 

0.1857 + 0.1526 + 0.2917 


65 


The allocations, shown in the second column from the right of table 
17.9.2, call for over 80% sampling in the second largest group of institu- 
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tions (101 out of 122), but only a 10% sample of the small colleges. In 
practice we might decide, for administrative convenience, to take a' 100% 
sample in the second largest group as well as in the largest. 

It is worthwhile to ask: Is the optimum allocation much superior to 
proportional allocation? If not, there is little value in going to the extra 
trouble of calculating and using the optimum allocation. We cannot, of 
course, answer this question for a future sample that is not yet taken, but 
we can compare the two methods of allocation for the 1952-1953 registra- 
tions. To do this, we use the data in tables 17.9.1 and 17.9.2 and the 
standard error formulas in section 17.8 to compute the standard errors of 
the estimated population totals by the two methods. These standard 
errors are found to be 26,000 for the optimum allocation, as against 
1 07,000 for proportional allocation. If simple random sampling had been 
used, with no stratification, a similar calculation shows that the corre- 
sponding standard error would have been 216,000. The reduction ip the 
standard error due to stratification, and the additional reduction due to 
the optimum allocation, are both striking. In an actual future sampling 
based on this stratification, the gains in precision would presumably be 
slightly less than these figures indicate. 

EXAMPLE 17.9.1 — For the population of colleges and universities discussed in this 
section it was stated that a stratified sample of 250 institutions, with proportional alloca- 
tion, would have a standard error of 107,000 for the estimated total registration m all 1,019 
institutions. Verify this statement from the data in table 17.9.1 Note that the standard 
error of the estimated population total, with proportional allocation, is 



17.10 — Stratified sampling for attributes. If an attribute is being 
sampled, the estimate appropriate to stratified sampling is 

Pst = 2 WhPhi 

where p h is the sample proportion in stratum h and W h = N k /Nis the strat- 
um weight. To find the standard error of p sz we substitute p h q h for s h 2 in 
the formulas previously given in section 1 7.8. 

As an example, consider a sample of 692 families in Iowa to deter- 
mine, among other things, how many had vegetable gardens in 1943. 
The families were arranged in three strata — urban, rural non-farm, and 
farm — because it was anticipated that the three groups might show differ- 
ences in the frequency and size of vegetable gardens. The data are given 
m table 17.10.1. 

The numbers of families were taken from the 1940 census. The 
sample was allotted roughly in proportion to the number of families per 
stratum, a sample of 1 per 1,000 being aimed at. 

The weighted mean percentage of Iowa families having gardens was 
estimated as 

2 ***/>» = (0.445)(72.7) + (0.230)(94.8) + (0.325)(96.6) = 85.6% 
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TABLE 17.10.1 

Numbers of Vegetable Gardens Among Iowa Families, Arranged in Thru Str a i \ 


Stratum 

Number of 
Families 

N h 

Weight 

Number in 
Sample 
n h 

Number 

With 

Gardens 

Percentage 

With, 

Garden* 

Urban 

312,39? 

0.445 

300 

218 

72 7 

Rural non-farm 

161,077 

0.230 

155 

147 

94 8 

Farm 

228,354 

0.325 

237 

229 

96 6 

Total 

701,824 

1.000 

692 

594 



This is practically the same as the sample mean percentage, 594/692 
or 85.8%, because allocation was so close to proportional. 

For the estimated variance of the estimated mean, we have 

E W h 2 p h qjn h = (0.445) 2 (72.7)(27.3)/300 + etc. = 1.62 


The standard error, then, is 1 .27%. 

With a sample of this size, the estimated mean will be approximately 
normally distributed : the confidence limits may be set as 


85.6 ± (2)(1.27) : 83.1%' and 88.1 


For the optimum choic e of the sample sizes within strata, we should 
take n h proportional to N h Jp h qJc h . If the cost of sampling is about the 
same in all strata, as is true in many surveys, this implies that the fraction 
sampled, n h /N h , should be proportional to h- Now th , e ff uantlty 
^ changes little as p ranges from 25% to 75%. Consequently propor- 
tional allocation is often highly efficient in stratified sampling for attri- 
butes. The optimum allocation produces a substantial reduction in the 
standard error, as compared with proportional allocation, only when some 
of the p h are close to zero or 100%, or when there are differential costs. 

The example on vegetable gardens departs from the strict principles 
of stratified sampling in that the strata sizes and weights were not known 
exactly, being obtained from census data three years previously. Errors 
in the strata weights reduce the gain in precision from stratification and 
make the standard formulas inapplicable. It is believed that in this 
example these disturbances are of negligible importance. Discussions of 
stratification when errors in the weights are present are given in (2) and 

(10V 

EXAMPLE 17.10.1 — In stratified sampling for attributes, the optimtim ^mpledistn^; 
non with equal costs per unit m all strata, follows from taking n h proportional to N h ^ 
It follows that the actual value of n h is 


\ N h Jm* ] 
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In the Iowa vegetable garden survey, suppose that the p h values found m the sample can be 
assumed to be the same as those in the population. Show that the optimum sample distribu- 
tion gives sample sizes of 445, 115, and 132 in the respective strata, and that the standard 
error of the estimated percentage with gardens would then be 1.17%, as compared with 
1.27% in the sample itself. 

17*11 — Sampling in two stages. Consider the following miscellaneous 
group of sampling problems: (1) a study of the vitamin A content of 
butter produced by creameries, (2) a study of the protein content of wheat 
in the wheat fields in an area, (3) a study of red blood cell counts in a 
population of men aged 20-30, (4) a study of insect infestation of the leaves 
of the trees in an orchard, and (5) a study of the number of defective 
teeth in third-grade children in the schools of a large city. What do these 
investigations have in common ? First, in each study an appropriate sam- 
pling unit suggests itself naturally — the creamery, the field of wheat, the 
individual man, the tree, and the school. Secondly, and this is the im- 
portant point, in each study the chosen sampling units can be sub-sampled 
instead of being measured completely. Indeed, sub-sampling is essential 
in the first three studies. No one is going to allow us to take all the butter 
produced by a creamery in order to determine vitamin A content, or 
all the wheat in a field for the protein determination, or all the blood in a 
man in order to make a complete count of his red cells. In the insect 
infestation study, it might be feasible, although tedious, to examine all 
leaves on any selected tree. If the insect distribution is spotty, however, 
we would probably decide to take only a small sample of leaves from any 
selected tree in order to include more trees. In the dental study/we could 
take all the third-grade children in any selected school or we could cover 
a larger sample of schools by examining only a sample of children from 
the third grade in each selected school. 

This type of sampling is called sampling in two stages , or sometimes 
sub-sampling. The first stage is the selection of a sample of primary sam- 
pling units — the creameries, wheat fields, and so on. The second stage is 
the taking of a sub-sample of second-stage units , or sub-units , from each 
selected primary unit. 

As illustrated by these examples, the two-stage method is sometimes 
the only practicable way m which the sampling can be done. Even when 
there is a choice between sub-sampling the units and measuring them com- 
pletely, two-stage sampling gives the sampler greater scope, since he can 
choose both the size of the sample of primary units and the size of the sam- 
ple that is taken from a primary unit. In some applications an important 
advantage of two-stage sampling is that it facilitates the problem of listing 
the population. Often it is relatively easy to obtain a list of the primary 
units, but difficult or expensive to list all the sub-units. To list the trees 
m an orchard and draw a sample of them is usually simple, but the prob- 
leni of making a random selection of the leaves on a tree may be very 
troublesome With two-stage sampling this problem is faced only for 
those trees that are m the sample. No complete listing of all leaves in the 
orchard is required. 
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In the discussion of two-stage sampling we assume at first that the 
primary units are of approximately the same size. A simple random sam- 
ple of n ! primary units is drawn, and the same number n 2 of sub-units is se- 
lected from each primary unit in the sample. The estimated standard 
error of the sample mean y per sub-unit is then given by the formula 

„ _ i lm - w 

? ~>W "r-l ' 

where y t is the mean per sub-unit in the ith primary unit. This formula 
does not include the finite population correction, but is reliable enough 
provided that the sample contains less than 10% of all primary units. 
Note that the formula makes no use of the individual observations on the 
sub-units, but only of the primary unit means y r If the sub-samples are 
taken for a chemical analysis, a common practice is to composite the 
sub-sample and make one chemical determination for each primary unit. 
With data of this kind we can still calculate s r 

In section 10.13 you learned about the “components of variance” 
technique, and applied it to a problem in two-stage sampling. The data 
were concentrations of calcium in turnip greens, four determinations be- 
ing made for each of three leaves. The leaf can be regarded as the primary 
sampling unit, and the individual determination as the sub-unit. By apply- 
ing the components of variance technique, you were able to see how the 
variance of the sample mean was affected by variation between determina- 
tions on the same leaf and by variation from leaf to leaf. You could also 
predict how the variance of the sample mean would change with different 
numbers of leaves and of determinations per leaf in the experiment. 

Since this technique is of wide utility in two-stage sampling, we shall 
repeat some of the results. The observation on any sub-unit is considered 
to be the sum of two independent terms. v One term, associated with the 
primary unit, has the same value for all second-stage units in the primary 
unit, and varies from one primary unit to another with variance <j x 2 . The 
second term, which serves to measure differences between second-stage 
units, varies independently from one sub-unit to another with variance 
a 2 2 . Suppose that a sample consists of n x primary units, from each of 
which n 2 sub-units are drawn. Then the sample as a whole contains n t 
independent values of the first term, whereas it contains n x n 2 independent 
values of the second term. Hence the variance of the sample mean y per 
sub-unit is 


' n i n x n 2 

The two components of variance, cr x 2 and cr 2 2 , can be estimated from 
the analysis of variance of a two-stage sample that has been taken. Table 
17.1 hi gives the analysis of variance for a study by Immer (6), whose 
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object was to develop a sampling technique for the determination of the 
sugar percentage in field experiments on sugar beets. Ten beets were 
chosen from each of 100 plots in a uniformity trial, the plots being the 
primary units. The sugar percentage was obtained separately for each 
beet. In order to simulate conditions in field experiments, the Between 
plots mean square was computed as the mean square between plots within 
blocks of 5 plots. This mean square gives the experimental error variance 
that would apply in a randomized blocks experiments with 5 treatments. 


TABLE 17.11.1 

Analysis of Variance of Sugar Percentage of Beets (on a Single-Beet Basis) 



Degrees of 

Mean 

Parameters 

Source of Variation 

Freedom 

Square 

Estimated 

Between plots (primary units) 

80 

2,9254 

o 2 2 4- 10 tfj 2 

Between beets (sub-units) within plots 

900 

2.1374 

*2 2 


The estimate of a t 2 , the Between plots component of variance, is 


s, = 


2.9254 - 2.1374 
10 


= 0.0788, 


the divisor 10 being the number of beets (sub-units) taken per plot. As an 
estimate of <r 2 2 , the within-plots component, we have 

V = 2.1374 

Hence, if a new experiment is to consist of n x replications, with r 7 beets 
sampled from each plot, the predicted variance of a treatment mean is 

, 0.0788 2.1374 

s 5 2 = + 


We shall illustrate two of the questions that can be answered from 
these data. How accurate are the treatment means in an experiment with 
6 replications and 5 beets per plot? For this experiment we would expect 


s 


y 


'0.0788 2.1374\ 

f 6 + 30 J 


= 0.29% 


The sugar percentage figure for a treatment mean would be correct to 
within ±(2) (0.29) or 0.58%, with 95% confidence, assuming y approxi- 
mately normally distributed. 

If the standard error of a treatment mean is not to exceed 0.2%, what 
combinations of n i and n 2 are allowable? We must have 


0.0788 
b 


2.1374 

r n n 2 


= (0.2) 2 


= 0.04 
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Since n x and n 2 are whole numbers, they will not satisy this equation 
exactly: we must make sure that the left side of the equation does not 
exceed 0.04. You can verify that with 4 replications (n x = 4), there must 
be 27 beets per plot; with 8 replications, 9 beets per plot are sufficient; 
and with 10 replications, 7 beets per plot. As one would expect, the in- 
tensity of sub-sampling decreases as the intensity of sampling is increased. 
The total size of sample also decreases from 108 beets when n x = 4 to 70 
beets when n x = 10. 

17.12 — The allocation of resources in two-stage sampling. The last 
example illustrates a general property of two-stage samples. The same 
standard error can be attained for the sample mean by using various 
combinations of values of n x and n 2 . Which of these choices is the best? 
The answer depends, naturally, on the cost of adding an extra primary 
unit to the sample (in this case an extra replication) relative to that of 
adding an extra sub-unit m each primary unit (in this case an extra beet 
in each plot). Similarly, in the turnip greens example (section 10.13, page 
280) the best sampling plan depends on th<w relative costs of taking an 
extra leaf and of making an extra determination per leaf. Obviously, if 
it is cheap to add primary units to the sample but expensive to add sub- 
units, the most economical plan will be to have many primary units and 
few (perhaps only one) sub-units per primary unit. For a general solution 
to this problem, however, we require a more exact formulation of the 
costs of various alternative plans. 

In many sub-sampling studies the cost of the sample (apart from 
fixed overhead costs) can be approximated by a relation of the form 

cost = c x n x 4- c 2 n x n 2 

The factor c x is the average cost per primary unit of those elements of 
cost that depend solely on the number of primary units and not on the 
amount of sub-sampling. The factor c 2f on the other hand, is the average 
cost per sub-unit of those constituents of cost that are directly proportional 
to the total number of sub-units. 

If advance estimates of these constituents of cost are made from a 
preliminary study, an efficient job of selecting the best amounts of sam- 
pling and sub-sampling can be done. The problem may be posed in two 
different ways. In some studies we specify the desired variance V for the 
sample mean, and would like to attain this as cheaply as possible. In 
other applications the total cost C that must not be exceeded is imposed 
upon us, and we want to get as small a value of V as we can for this outlay. 
These two problems have basically the same solution In either case we 
want to minimize the product 

/ s 2 s 2 \ 

VC = -L. + — L_ )(c 1 n l + t 2 nyn 2 ) 

\n i n x n 2 j 
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Upon expansion, this becomes 


VC = (s^Cx + s 2 2 c 2 ) + n 2 s x 2 c 2 + 

n 2 

It can be shown that this expression has its smallest value when 



This result gives an estimate of the best number of sub-units (beets) per 
primary unit (plot). The value of n x is found by solving either the cost 
equation or the variance equation for n x , depending on whether cost or 
variance has been preassigned. 

In the sugar beet example we had s x 2 - 0.0788, s 2 2 = 2.1374, from 
which 

2T374 fcj' 

0.0788 \ c 2 

In this study, cost data were not reported. If c t were to include the 
cost of the land and the field operations required to produce one plot, it 
would be much greater than c 2 - Evidently a fairly large number of beets 
per plot would be advisable. In practice, factors other than the sugar 
percentage determinations must also be taken into account in deciding 
on costs and number of replications in sugar beet experiments. 

In the turnip greens example (section 10.13, page 280), n l is the num- 
ber of leaves and n 2 the number of determinations of calcium concentra- 
tion per leaf. Also, in the present notation, 

s x 2 =s a 2 = 0.0724 
s 2 2 = s 2 * 0.0066 

Hence, the most economical number of determinations per leaf is estimated 
to be 

c l s 2 2 __ jO. 0066 
c^ 2 = V 0.0724 






In practice, n 2 must be a whole number, and the smallest value it can have 
is 1. This equation shows that n 2 = 1, i.e., one determination per leaf, 
unless c x is at least 25 times c 2 . Actually, since c 2 includes the cost of 
the chemical determinations, it is likely to be greater than c x . The 
relatively large variation among leaves and the cost considerations both 
point to the choice of one determination per leaf. 

This example also illustrates that a choice of n 2 can often be made 
from the equation even when information about relative costs is not too 
definite This is because the equation often leads to the same value of n 2 
for a wide range of ratios of c x to c 2 . Brooks (14) gives helpful tables for 
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this situation. The values of n 2 are subject to sampling errors; for a 
discussion, see (2). 

In section 10.14 you studied an example of three-stage sampling of 
turnip green plants. The first stage was represented by plants, the second 
by leaves within plants, and the third by determinations within a leaf. In 
the notation of this section, the estimated variance of the sample mean is 


n x n x n 2 n x n 2 n 3 

Copying the equation given in section 10.14, we have 
2 0.3652 0.1610 0.0067 

z=z j I 

y n x n 2 n x n 2 n 2 


To find the most economical values of n u n 2 , and n 3 , we set up a cost equa- 
tion of the form 


cost — c x n x -f c 2 n x n 2 + c 3 n x n 2 n 3 


and proceed to minimize the product of the variance and the cost as before. 
The solutions are 


n 2 = 




while n x is found by solving either the cost or the variance equation. Note 
that the formula for n 2 is the same in three-stage as in two-stage sampling, 
and that the formula for n 3 is the natural extension of that for n 2 . Putting 
in the numerical values of the variance components, we obtain 


«2 = 


M0.1610) 

c 2 (0.3652) 



n 3 = 


fc 2 (0.0067) 

c 3 (0.1610) 



Since the computed value of n 3 would be less than 1 for any likely value 
of c 2 /c 3 , more than one determination per leaf is uneconomical. The 
optimum number n 2 of leaves per plant depends on the ratio c x /c 2 . This 
will vary with the conditions of experimentation. If many plants are 
being grown for some other purpose, so that ample numbers are available 
for sampling, c x includes only the extra costs involved in collecting a 
sample from many plants instead of a few plants. In this event the opti- 
mum n 2 might also turn out to be 1 . If the cost of growing extra plants is 
to be included in c u the optimum n 2 might be higher than 1. 

EXAMPLE 17 12.1 — This is the analysis of variance, on a single sub-sample basis, for 
wheat yield and percentage of protein from data collected in a wheat sampling survey m 
Kansas m 1939 (25) 
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Yield 


Protein 


, 

(Bushels Per Acre) 

<%) 



Degrees of 

Mean 

Degrees of 

Mean 

Source of Variation 

Freedom 

Square 

Freedom 

Square 

Fields 

659 

434.52 

659 

21.388 

Samples within fields 

660 

67.54 

609 

2.870 


Two sub-samples were taken at random from each of 660 fields. Calculate the com- 
ponents of variance for yield. Ans. s x 2 - 183.49, s 2 2 = 67.54. Note: Some of the protein 
figures were evidently not recorded separately for each sub-sample, since there are only 
609 d.f. withm fields. 

EXAMPLE 17. 12.2 --For yield, estimate the variance of the sample mean for samples 
consisting of (i) 1 sub-sample from each of 800 fields, (li) 2 sub-samples from each of 400 
fields, (iii) 8 samples from each of 100 fields. Ans. (i) 0.313, (ii) 0.543, (iii) 1.919. 

EXAMPLE 17. 12.3 — With 2 sub-samples per field, it is desired to take enough fields so 
that the standard error of the mean yield will be not more than 1 /2 bushel, and at the same 
time the standard error of the mean protein percentage will be not more than $%. How 
many fields are required? Ans. about 870. 

EXAMPLE 17.12.4 — Suppose that it takes on the average 1 man-hour to locate and 
pace a field that is to be sampled. A single protein determination is to be made on the bulked 
sub-samples from any field. The cost of a determination is equivalent to 1 man-hour. It 
takes 15 minutes to locate, cut, and tie a sub-sample. From these data and the analysis of 
variance for protein percentage (example 17.12.1), compute the variance-cost product, VC, 
for each value of n 2 from 1 to 5. What is the most economical number of sub-samples per 
field? Ans. 2. How much more does it cost, for the same V, if 4 sub-samples per field are 
used? Ans. 12%. 

17.13 — Selection with probability proportional to size. In many im- 
portant sampling problems, the natural primary sampling units vary in 
size. In national surveys conducted to obtain information about the 
characteristics of the population, the primary unit is often an adminis- 
trative area (e.g., similar to a county). A relatively large unit of this type 
cuts down travel costs and makes supervision and control of the field 
work more manageable. Such units often vary substantially in the num- 
ber of people they contain. A sample of the houses in a town may use 
blocks as first-stage units, the number of houses per block ranging from 
0 to 40. Similarly, schools, hospitals, and factories all contain different 
numbers of individuals. 

With primary units like this, the between-primary-unit variances of 
the principal measurements may be large; for example, some counties 
are relatively wealthy and some are poor. In these circumstances, 
Hansen and Hurwitz (15) pointed out the advantages of selecting primary 
units with probabilities proportional to their sizes. To illustrate, con- 
Slder a population of three schools, having 600, 300, and 100 children. 
The objective is to estimate the population mean per child for some char- 
acteristic. The means per child in the three schools are Y x = 2, F 2 = 4, 
T 3 = 1 . Hence, the population mean per child is 
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Y = R600)(2) + (300)(4) + (100) (1)J/1,000 = 2.5 

To simplify things further, suppose that only one school is to be 
chosen, and that the variation in Y between children within the same 
school is negligible. It follows that we need not specify how the second- 
stage sample of children from a school is to be drawn, since any sample 
gives the correct mean for the chosen school. 

In selecting the school with probability proportional to size, (pps ), 
the three schools receive probabilities 0.6, 0.3, and 0.1, respectively, of 
being drawn. We shall compare the mean square error of the estimate 
given by this method with that given by selecting the schools with equal 
probabilities. Table 17.13.1 contains the calculations. 


TABLE 17.13.1 

Selection of a School With Probability Proportional to Size 


School 

No. of 
Children 

Probability 

of 

Selection tc* 

! Mean per 
Child 

% 

*7 

Error of j 
Estimate 

? { -t 

i 

(Y-f) 1 

1 

600 

0.6 

2 

-0.5 

} 0.25 

2 

300 

0.3 

4 

+ 1.5 

1 2.25 

3 

100 

0.1 

1 

-1.5 

2.25 

Population 

1,000 

1.0 

I 

2.5 




If the first school is selected, its estimate is in error by (2.0 - 2.5) 
= — 0. 5, and so on. These errors and their squares appear in the two right- 
hand columns of table 17.13.1. In repeated sampling with probability 
proportional to size, the first school is drawn 60% of the time, the second 
school 30%, and the third school 1 0%. The mean square error is therefore 

M.S.E ptn = (0.6X0.25) + (0.31(2.25) + (0.1)(2.25) = 1.05 

If, alternatively, the schools are drawn with equal probability, the M.S.E 
is 

M.S.E. eq = M(0.25)+ (2.25) + (2.25)) = 1.58 

This M.S.E is about 50% higher than that given by pps selection. 

You may ask : Does this result depend on the choice or the order of 
the means, 2, 4, 1, assigned to schools 1, 2, and 3? The answer is yes. 
With means 4, 2, 1, you will find M.S.E pps = 1.29, M.S.E eq = 2.14, the 
latter being 66% higher. Over the six possible orders of the numbers 
1, 2, 4, the ratio M.S.EJM.S.E pps varies from 0.93 to 2.52. However, 
the ratio of the averages M.S.EJM.S.E pps , taken over all six possible 
orders, does not depend on the numbers 1 , 2, 4. With N primary units in 
the population, this ratio is 
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MSl eq (N- l) + N^(n l -n) 2 

MS - E pp* (N - 1) - N £ («, - 7t) 2 

where 7 t, is the probability of selection (relative size) of the z'th school. 
Clearly, this ratio exceeds one unless all n t are equal ; that is, all schools are 
the same size. 

The reason why it usually pays to select large units with higher prob- 
abilities is that the population mean depends more on the means of the 
large units than on those of the small units. The large units are therefore 
likely to give better estimates. 

With two-stage sampling, a simple method is to select n primary units 
with pps and take an equal number of sub-units (e.g., children) in every 
selected primary unit. This method gives every sub-unit in the population 
the same chance of being in the sample. The sample mean per sub-unit J 
is an unbiased estimate of the corresponding population mean, and its 
estimated variance is obtained by the simple formula 

S V 2 = E (J'i “ y) 2 /"(n - 1), (17.13.1) 


where y t is the mean of the sample from the z'th primary unit. 

We have illustrated only the simplest case. Formula 17.13.1 as- 
sumes that the n units are selected with replacement (i.e., that a unit can 
be chosen more than once). Some complications arise when we select 
units without replacement. Often, the sizes of the units are not known 
exactly, and have to be estimated in advance. Considerations of cost or of 
the structure of variability in the population may lead to the selection of 
units with probabilities that are unequal, but are proportional to some 
quantity other than the sizes. For details, see the references. In extensive 
surveys, multistage sampling with unequal probabilities of selection of 
primary units is the commonest method in current practice. 

17.14 — Ratio and regression estimates. The ratio estimate is a differ- 
ent way of estimating population totals (or means) that is useful in many 
sampling problems. Suppose that you have taken a sample in order to 
estimate the population total of a variable, 7, and that a complete count 
of the population was made on some previous occasion. Let X denote the 
value of the variable on the previous occasion. You might then compute 
the ratio 


where the sums are taken over the sample. This ratio is an estimate of the 
present level of the variate relative to that on the previous occasion, 
multiplying the ratio by the known population total on the previous 
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occasion (i.e., by the population total of X ), you obtain the ratio estimate 
of the population total of F. Clearly, if the relative change is about the 
same on all sampling units, the ratio R will be accurate and the estimate 
of the population total will be a good one. 

The ratio estimate can also be used when X is some other kind of sup- 
plementary variable. The conditions for a successful application of this 
estimate are that the ratio Y/X should be relatively constant over the pop- 
ulation and that the population total of X should be known. Consider 
an estimate of the total amount of a crop, just after harvest, made from a 
sample of farms in some region. For each farm in the sample we record the 
total yield, F, and the total acreage, X , of that crop. In this case the ratio, 
jR = Z F/ZF, is the sample estimate of the mean yield per acre. This 
is multiplied by the total acreage of the crop in the region, which would 
have to be known accurately from some other source. This estimate will 
be precise if the mean yield per acre varies little from farm to farm. 

The estimated standard error of the ratio estimate f R of the popula- 
tion total from a simple random sample of size n is, approximately, 


s(? R ) = N 


'i(y - rx) 2 

n(n - 1) 


The ratio estimate is not always more precise than the simpler esti- 
mate Ny (number of units in population x sample mean). It has been 
shown that the ratio estimate is more precise only if p, the correlation 
coefficient between F and X , exceeds C x /2C y , where the C’s are the co- 
efficients of variation. Consequently, ratio estimates must not be used 
indiscriminately, although in appropriate circumstances they produce 
large gains in precision. 

Sometimes the purpose of the sampling is to estimate a ratio, e.g. 
ratio of dry weight to total weight or ratio of clean wool to total wool. The 
estimated standard error of the estimate is then 



Z(F — RX) 2 
n{n - 1) 


This formula has already been given (in a different notation) at the end of 
section 17.5, where the estimation of proportions from cluster sampling 
was discussed. 

In chapter 6 the linear regression of F on X and its sample estimate, 
f = y 4 - bx, 

were discussed. With an auxiliary variable, X , you may find that when 
you plot F against X from the sample data, the points appear to lie close 
to a straight line, but the line does not go through the origin. This implies 
that the ratio Y/X is not constant over the sample. As pointed out in 
section 6.19, it is then advisable to use a linear regression estimate instead 
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of the ratio estimate. For the population total of Y, the linear regression 
estimate is 

N? = N{y + b(X - 5c)}, 

where X is the population mean of X. The term inside the brackets is 
the sample mean, y, adjusted for regression. To see this, suppose that you 
have taken a sample in which y — 2.35, x = 1.70, X = 1.92, b = +0.4. 
Your first estimate of the population mean would be y = 2.35. But in 
the sample the mean value of X is too low by an amount (1.92 — 1.70) 
= 0.22. Further, the value of b tells you that unit increase in X is accom- 
panied, on the average, by +0.4 unit increase in Y. Hence, to correct 
for the low value of the mean of X, you increase the sample mean by the 
amount (+0.4)(0.22). Thus the adjusted value of y is 

2.35 + (+0.4)(0.22) = 2.44 = y + b(X - x) 

To estimate the population total, this value is multiplied by N, the number 
of sampling units in the population. 

The standard error of the estimated population total is, approxi- 
mately, 

xr If 1 (X - x) 2 \ 

- Ns r* y Jy n + Zx 2 J 

If a finite population correction is required in the standard error formulas 
presented in this section, insert the factor N /(l - </>). In finite popula- 
tions the ratio and regression estimates are both slightly biased, but the 
bias is seldom important in practice. 

17.15 — Further reading. The general books on sample surveys that 
have become standard, (2), (3), (4), (5), (13), involve roughly the same level 
of mathematical difficulty and knowledge of statistics. Reference (3) is 
oriented towards applications in business, and reference (13) towards 
those in agriculture. Another good book for agricultural applications, at 
a lower mathematical level, is (16). 

Useful short books are (17), an informal, popular account of some 
of the interesting applications of survey methods, (18), which conducts the 
reader painlessly through the principal results in probability sampling 
it about the mathematical level of this chapter, and (19), which discusses 
he technique of constructing interview questions. 

Books and papers have also begun to appear on some of the common 
specific types of application. For sampling a town under U.S. conditions, 
with the block as primary sampling unit, references (20) and (21) are rec- 
ommended. Reference (22), intended primarily for surveys by health 
agencies to check on the immunization status of children, gives instruc- 
tions for the sampling of attributes in local areas, while (24) deals with the 
sampling of hospitals and patients. Much helpful advice on the use of 
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sampling in agricultural censuses is found in (23). Sampling techniques 
for estimating the volume of timber of the principal types and age-classes 
in foresty are summarized in (1 1), while (9) reviews the difficult problem 
of estimating wildlife populations. 
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Notes 

Interpolation. In analyses of data and in working the examples in this 
book, use of the nearest entry in any Appendix table is accurate enough 
in most cases. The following examples illustrate linear interpolation, 
which will sometimes be needed. 

1. Find the 5% significance level of x 2 for 34 degrees of freedom. 
For P = 0.050. table A 5 gives 


Calculate (34 — 30)/(40 — 30) = 0.4. Since 

34 = 30 + 0.4(40 - 30) 
the required value of x 2 is 

43.77 + 0.4(55.76 - 43.77) = 43.77 + 0.4(11.99) - 48.57 
Alternatively, this value can be computed as 

(0A)xlo + (0.6)*fo = (0.4)(55.76) + (0.6)(43.77) = 48.57 
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Note that 0.4 multiplies xlo> not xlo • 

2. An analysis gave an F value of 2.04 for 3 and 18 df Find tfa 
significance probability. For 3 and 18 d.f., table A 14, part II, gives tfa 
following entries : 

P 0.25 ? 0.10 

F 1.49 2.04 2.42 

Calculate (2.04 - 1 .49)/(2.42 - 1.49) = 0.55/0.93 = 0.59. By the alterna- 
tive method in the preceding example. 

p = (0.59)(0.10) + (0.41X0.25) = 0.16 

Finding Square Roots. Table A 1 8 is a table of square roots. To save 
space the entries jump by 0.02 instead of 0.01, but interpolation will rarely 
be necessary. With very large or very small numbers, mistakes in finding 
square roots are common. The following examples should clarify the 
procedure. 


Step 

(i) 

(2) 

(3) 

(4) 


Mark 

Column 


Square 

Number 

Off 

Read 

Reading 

Root 

6,028.0 

60,28.0 

s/ 10|» 

7.76 

77.6 

397 2 

3,97 2 


1.99 

19.9 

46.38 

46.38 


6.81 

6.81 

0.194 

0.19,4 

VlO” 

4.40 

0.440 

0 000893 

0.00,08,93 

> 

2.99 

0.0299 


In step (1), mark off the digits in twos to the right or left of the decimal 
point. Step (2) tells which column of the square root table is to be read. 
With 3,97.2 and 0.00,08,93 read the fn column, because there is a single 
digit (3 or 8) to the left of the first comma that has any non-zero digits to 
its left. If there are two digits to the left of the first comma, as in 60,28.0, 
read the v y 10 n column. Step (3) gives the reading, taken directly from the 
nearest entry in the table 

The final step (4) finds the actual square roots. Consider, first, num- 
bers greater than 1 . If column ( 1 ) has no comma to the left of the decimal, 
as with 46.38, the square root has one digit to the left of the decimal. If 
column (1) has one comma to the left of the decimal, as with 60,28.0 and 
3,97.2 the square root has two digits to the left of the decimal, and so on. 
With numbers smaller than 1, replace any pair 00 to the right of the deci- 
mal by a single 0. Thus, the square root of 0.00,08,93 is 0.0299 as shown. 
The square root of 0.00,00,08,93 is 0.00299. 



TABLE A 1 

Ten Thousand Randomly Assorted Digits 



00-04 

05-09 

10-14 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

00 

54463 

22662 

65905 

70639 

79365 

67382 

29085 

69831 

47058 

08186 

01 

15389 

85205 

18850 

39226 

42249 

90669 

96325 

23248 

60933 

26927 

02 

85941 

40756 

82414 

02015 

13858 

78030 

16269 

65978 

01385 

15345 

03 

61149 

69440 

11286 

88218 

58925 

03638 

52862 

62733 

33451 

77455 

04 

05219 

81619 

10651 

67079 

92511 

59888 

84502 

72095 

83463 

75577 

05 

41417 

98326 

87719 

92294 

46614 

50948 

64886 

20002 

97365 

30976 

06 

28357 

94070 

20652 

35774 

16249 

75019 

21145 

05217 

47286 

76305 

07 

17783 

00015 

10806 

83091 

91530 

36466 

39981 

62481 

49177 

75779 

08 

40950 

84820 

29881 

85966 

62800 

70326 

84740 

62660 

77379 

90279 

09 

82995 

64157 

66164 

41180 

10089 

41757 

78258 

96488 

88629 

37231 

10 

96754 

17676 

55659 

44105 

47363 

34833 

86679 

23930 

53249 

27083 

11 

34357 

88040 

53364 

71726 

45690 

66334 

60332 

22554 

90600 

71113 

12 

06318 

37403 

49927 

57715 

50423 

67372 

63116 

48888 

21505 

80182 

13 

62111 

52820 

07243 

79931 

89292 

84767 

85693 

73947 

22278 

11551 

14 

47534 

09243 

67879 

00544 

23410 

12740 

02540 

54440 

32949 

13491 

15 

98614 

75993 

84460 

62846 

59844 

14922 

48730 

73443 

48167 

34770 

16 

24856 

03648 

44898 

09351 

98795 

18644 

39765 

71058 

90368 

44104 

17 

96887 

12479 

80621 

66223 

86085 

78285 

02432 

53342 

42846 

94771 

18 

90801 

21472 

42815 

77408 

37390 

76766 

52615 

32141 

30268 

18106 

19 

55165 

77312 

83666 

36028 

28420 

70219 

81369 

41943 

47366 

41067 

20 

75884 

12952 

84318 

95108 

72305 

64620 

91318 

89872 

45375 

85436 

21 

16777 

37116 

58550 

42958 

21460 

43910 

01175 

87894 

81378 

10620 

22 

46230 

43877 

80207 

88877 

89380 

32992 

91380 

03164 

98656 

59337 

23 

42902 

66892 

46134 

01432 

94710 

23474 

20423 

60137 

60609 

13119 

24 

81007 

00333 

39693 

28039 

10154 

95425 

39220 

19774 

31782 

49037 

25 

68089 

01122 

51111 

72373 

06902 

74373 

96199 

97017 

41273 

21546 

26 

20411 

67081 

89950 

16944 

93054 

87687 

96693 

87236 

77054 

33848 

27 

58212 

13160 

06468 

15718 

82627 

76999 

05999 

58680 

96739 

63700 

28 

70577 

42866 

24969 

61210 

76046 

67699 

42054 

12696 

93758 

03283 

29 

94522 

74358 

71659 

62038 

79643 

79169 

44741 

05437 

39038 

13163 

30 

42626 

86819 

85651 

88678 

17401 

03252 

99547 

32404 

17918 

62880 

31 

16051 

33763 

57194 

16752 

54450 

19031 

58580 

47629 

54132 

60631 

32 

08244 

27647 

33851 

44705 

94211 

46716 

11738 

55784 

95374 

72655 

33 

59497 

04392 

09419 

89964 

51211 

04894 

72882 

17805 

21896 

83864 

34 

97155 

13428 

40293 

09985 

58434 

01412 

69124 

82171 

59058 

82859 

35 

98409 

66162 

95763 

47420 

20792 

61527 

20441 

39435 

11859 

41567 

36 

45476 

84882 

65109 

96597 

25930 

66790 

65706 

61203 

53634 

22557 

37 

89300 

69700 

50741 

30329 

11658 

23166 

05400 

66669 

48708 

03887 

38 

50051 

95137 

91631 

66315 

91428 

12275 

24816 

68091 

71710 

33258 

39 

31753 

85178 

31310 

89642 

98364 

02306 

24617 

09609 

83942 

22716 

40 

79152 

53829 

77250 

20190 

56535 

18760 

69942 

77448 

33278 

48805 

41 

44560 

38750 

83635 

56540 

64900 , 

42912 

13953 

79149 

18710 

68618 

42 

68328 

83378 

63369 

71381 

39564 

05615 

42451 

64559 

97501 

65747 

43 

46939 

38689 

58625 

08342 

30459 

85863 

20781 

09284 

26333 

91777 

44 

83544 

86141 

15707 

96256 

23068 

13782 

08467 

89469 

93842 

55349 

45 

91621 

00881 

04900 

54224 

46177 

55309 

17852 

27491 

89415 

23466 

46 

91896 

67126 

04151 

03795 

59077 

11848 

12630 

98375 

52068 

60142 

47 

55751 

62515 

21108 

80830 

02263 

29303 

37204 

96926 

30506 

09808 

48 

85156 

87689 

95493 

88842 

00664 

55017 

55539 

17771 

69448 

87530 

49 

07521 

56898 

12236 

60277 

39102 

62315 

12239 

07105 

11844 

01117 
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50-54 

55-59 

60-64 

65-69 

70-74 

75-79 

80-84 

85-89 

90-94 

95-99 

00 

59391 

58030 

52098 

82718 

87024 

82848 

04190 

96574 

90464 

29065 

01 

99567 

76364 

77204 

04615 

27062 

96621 

43918 

01896 

83991 

51141 

02 

10363 

97518 

51400 

25670 

98342 

61891 

27101 

37855 

06235 

33316 

03 

86859 

19558 

64432 

16706 

99612 

59798 

32803 

67708 

15297 

28612 

04 

11258 

24591 

36863 

55368 

31721 

94335 

34936 

02566 

80972 

08188 

05 

95068 

88628 

35911 

14530 

33020 

80428 

39936 

31855 

34334 

64865 

06 

54463 

47237 

73800 

91017 

36239 

71824 

83671 

39892 

60518 

37092 

07 

16874 

62677 

57412 

13215 

31389 

62233 

80827 

73917 

82802 

84420 

08 

92494 

63157 

76593 

91316 

03505 

72389 

96363 

52887 

01087 

66091 

09 

15669 

56689 

35682 

40844 

53256 

81872 

35213 

09840 

34471 

74441 

10 

99116 

75486 

84989 

23476 

52967 

67104 

39495 

39100 

17217 

74073 

11 

15696 

10703 

65178 

90637 

63110 

17622 

53988 

71087 

84148 

11670 

12 

97720 

15369 

51269 

69620 

03388 

13699 

33423 

67453 

43269 

56720 

13 

11666 

13841 

71681 

98000 

35979 

39719 

81899 

07449 

47985 

46967 

14 

71628 

73130 

78783 

75691 

41632 

09847 

61547 

18707 

85489 

69944 

15 

40501 

51089 

99943 

91843 

41995 

88931 

73631 

69361 

05375 

15417 

16 

22518 

55576 

98215 

82068 

10798 

86211 

36584 

67466 

69373 

40054 

17 

75112 

30485 

62173 

02132 

14878 

92879 

22281 

16783 

86352 

00077 

18 

80327 

02671 

98191 

84342 

90813 

49268 

95441 

15496 

20168 ' 

09271 

19 

60251 

45548 

02146 

05597 

48228 

81366 

34598 

72856 

66762 

17002 

20 

57430 

82270 

10421 

00540 

43648 

75888 

66049 

21511 

47676 

33444 

21 

73528 

39559 

34434 

88596 

54086 

71693 

43132 

14414 

79949 

85193 

22 

25991 

65959 

70769 

64721 

86413 

33475 

42740 

06175 

82758 

66248 

23 

78388 

16638 

09134 

59980 

63806 

48472 

39318 

35434 

24057 

74739 

24 

12477 

09965 

96657 

57994 

59439 

76330 

24596 

77515 

09577 

91871 

25 

83266 

32883 

42451 

15579 

38155 

29793 

40914 

65990 

16255 

17777 

26 

76970 

80876 

10237 

39515 

79152 

74798 

39357 

09054 

73579" 

92359 

27 

37074 

65198 

44785 

68624 

98336 

84481 

97610 

78735 

46703 

98265 

28 

83712 

06514 

30101 

78295 

54656 

85417 

43189 

60048 

72781 

72606 

29 

20287 

56862 

69727 

94443 

64936 

08366 

27227 

05158 

50326 

59566 

30 

74261 

32592 

86538 

2\)41 

65172 

85532 

07571 

80609 

39285 

65340 

r 3i 

64081 

49863 

08478 

96001 

18888 

14810 

70545 

89755 

59064 

07210 

32 

05617 

75818 

47750 

67814 

29575 

10526 

66192 

44464 

27058 

40467 

33 

26793 

74951 

95466 

74307 

13330 

42664 

85515 

20632 

05497 

33625 

34 

65988 

72850 

48737 

54719 

52056 

01596 

03845 

35067 

03134 

70322 

35 

27366 

42271 

44300 

73399 

21105 

03280 

73457 

43093 

05192 

48657 

36 

56760 

10909 

98147 

34736 

33863 

95256 

12731 

66598 

50771 

83665 

37 

72880 

43338 

93643 

58904 

59543 

23943 

11231 

83268 

65938 

81581 

38 

77888 

38100 

03062 

58103 

47961 

83841 

25878 

23746 

55903 

44115 

39 

28440 

07819 

21580 

51459 

47971 

29882 

13990 

29226 

23608 

15873 

40 

63525 

94441 

77033 

12147 

51054 

49955 

58312 

76923 

96071 

05813 

41 

47606 

93410 

16359 

89033 

89696 

47231 

64498 

31776 

05383 

39902 

42 

52669 

45030 

96279 

14709 

52372 

87832 

02735 

50803 

72744 

88208 

43 

16738 

60159 

07425 

62369 

07515 

82721 

37875 

71153 

21315 

00132 

44 

59348 

11695 

45751 

15865 

74739 

05572 

32688 

20271 

65128 

14551 

43 

12900 

71775 

29845 

60774 

94924 

21810 

38636 

33717 

67598 

82521 

46 

75086 

23537 

49939 

33595 

13484 

97588 

28617 

17979 

70749 

35234 

47 

99495 

51434 

29181 

09993 

38190 

42553 

68922 

52125 

91077 

40197 

48 

49 

26075 

31671 

45386 

36583 

93459 

48599 

52022 

41330 

60651 

91321 


13636 

93596 

23377 

51133 

95126 

61496 

42474 

45141 

46660 

42338 
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00-04 

05-09 

10-14 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

50 

64249 

63664 

39652 

40646 

97306 

31741 

07294 

84149 

46797 

82487 

51 

26538 

44249 

04050 

48174 

65570 

44072 

40192 

51153 

11397 

58212 

52 

05845 

00512 

78630 

55328 

18116 

69296 

91705 

86224 

29503 

57071 

53 

74897 

68373 

67359 

51014 

33510 

83048 

17056 

72506 

82949 

54600 

54 

20872 

54570 

35017 

88132 

25730 

22626 

86723 

91691 

13191 

77212 

55 

31432 

96156 

89177 

75541 

81355 

24480 

77243 

76690 

42507 

#4362 

56 

66890 

61505 

01240 

00660 

05873 

13568 

76082 

79172 

57913 

93448 

57 

41894 

57790 

79970 

33106 

86904 

48119 

52503 

24130 

72824 

21627 

58 

11303 

87118 

81471 

52936 

08555 

28420 

49416 

44448 

04269 

27029 

59 

54374 

57325 

16947 

45356 

78371 

10563 

97191 

53798 

12693 

27928 

60 

64852 

34421 

61046 

90849 

13966 

39810 

42699 

21753 

76192 

10508 

61 

16309 

20384 

09491 

91588 

97720 

89846 

30376 

76970 

23063 

35894 

62 

42587 

37065 

24526 

72602 

57589 

98131 

37292 

05967 

26002 

51945 

63 

40177 

98590 

97161 

41682 

84533 

67588 

62036 

49967 

01990 

72308 

64 

82309 

76128 

93965 

26743 

24141 

04838 

40254 

26065 

07938 

76236 

65 

79788 

68243 

59732 

04257 

27084 

14743 

17520 

95401 

55811 

76099 

66 

40538 

79000 

89559 

25026 

42274 

23489 

34502 

75508 

06059 

86682 

67 

64016 

73598 

18609 

73150 

62463 

33102 

45205 

87440 

96767 

67042 

68 

49767 

12691 

17903 

93871 

99721 

79109 

09425 

26904 

07419 

76013 

69 

76974 

55108 

29795 

08404 

82684 

00497 

51126 

79935 

57450 

55671 

70 

23854 

08480 

85983 

96025 

50117 

64610 

99425 

62291 

86943 

21541 

71 

68973 

70551 

25098 

78033 

98573 

79848 

31778 

29555 

61446 

23037 

72 

36444 

93600 

65350 

14971 

25325 

00427 

52073 

64280 

18847 

24708 

73 

03003 

87800 

07391 

11594 

21196 

00781 

32550 

57158 

58887 

73041 

74 

17540 

26188 

36647 

78386 

04558 

61463 

57842 

90382 

77019 

24210 

75 

38916 

55809 

47982 

41968 

69760 

79422 

80154 

91486 

19180 

15100 

76 

64288 

19843 

69122 

42502 

48508 

28820 

59933 

72998 

99942 

10515 

77 

86809 

51564 

38040 

39418 

49915 

19000 

58050 

1 6899 

79952 

57849 

78 

99800 

99566 

14742 

05028 

30033 

94889 

53381 

23656 

75787 

59223 

79 

92345 

31890 

95712 

08279 

91794 

94068 

49337 

88674 

35355 

12267 

80 

90363 

65162 

32245 

82279 

79256 

80834 

06088 

99462 

56705 

06118 

81 

64437 

32242 

48431 

04835 

39070 

59702 

31508 

60935 

22390 

52246 

82 

91714 

53662 

28373 

34333 

55791 

74758 

51144 

18827 

30704 

76803 

83 

20902 

17646 

31391 

31459 

33315 

03444 

55743 

74701 

58851 

27427 

84 

12217 

86007 

70371 

52281 

14510 

76094 

96579 

54853 

78339 

20839 

85 

45177 

02863 

42307 

53571 

22532 

74921 

17735 

42201 

80540 

54721 

86 

28325 

90814 

08804 

52746 

47913 

54577 

47525 

77705 

95330 

21866 

87 

29019 

28776 

56116 

54791 

64604 

08815 

46049 

71186 

34650 

14994 

88 

84979 

81353 

56219 

67062 

26146 

82567 

33122 

14124 

46240 

92973 

89 

50371 

26347 

48513 

63915 

11158 

25563 

91915 

18431 

92978 

11591 

90 

53422 

06825 

69711 

67950 

64716 

18003 

49581 

45378 

99878 

61130 

91 

67453 

35651 

89316 

41620 

32048 

70225 

47597 

33137 

31443 

51445 

92 

07294 

85353 

74819 

23445 

68237 

07202 

99515 

62282 

53809 

26685 

93 

79544 

00302 

45338 

16015 

66613 

88968 

14595 

63836 

77716 

79596 

94 

04144 

85442 

82060 

46471 

24162 

39500 

87351 

36637 

42833 

71875 

95 

90919 

11883 

58318 

00042 

52402 

28210 

34075 

33272 

00840 

73268 

96 

06670 

57353 

86275 

92276 

77591 

46924 

60839 

55437 

03183 

13191 

97 

36634 

93976 

52062 

83678 

41256 

60948 

18685 

48992 

19462 

96062 

98 

75101 

72891 

85745 

67106 

26010 

62107 

60885 

37503 

55461 

■’1213 

99 

05112 

"1222 

72654 

51583 

05228 

62056 

57390 

42746 

39272 

96659 





50-54 

55-59 

60-64 

65-69 

70-74 

75-79 

80-84 

85-89 

90-94 

95-99 

50 

32847 

31282 

03345 

89593 

69214 

70381 

78285 

20054 

91018 

16742 

51 

16916 

00041 

30236 

55023 

14253 

76582 

12092 

86533 

92426 

37655 

52 

66176 

34037 

21005 

27137 

03193 

48970 

64625 

22394 

39622 

79085 

53 

46299 

13335 

12180 

16861 

38043 

59292 

62675 

63631 

37020 

78195 

54 

22847 

47839 

45385 

23289 

47526 

54098 

45683 

55849 

51575 

64689 

55 

41851 

54160 

92320 

69936 

34803 

92479 

33399 

71160 

64777 

83378 

56 

28444 

59497 

91586 

95917 

68553 

28639 

06455 

34174 

11130 

91994 

57 

47520 

62378 

98855 

83174 

13088 

16561 

68559 

26679 

06238 

51254 

58 

34978 

63271 

13142 

82681 

05271 

08822 

06490 

44984 

49307 

61717 

59 

37404 

80416 

69035 

92980 

49486 

74378 

75610 

74976 

70056 

15478 

60 

32400 

65482 

52099 

53676 

74648 

94148 

65095 

69597 

52771 

71551 

61 

89262 

86332 

51718 

70663 

11623 

29834 

79820 

73002 

84886 

03591 

62 

86866 

09127 

98021 

0387! 

27789 

58444 

44832 

36505 

40672 

30180 

63 

90814 

14833 

08759 

74645 

05046 

94056 

99094 

65091 

32663 

73040 

64 

19192 

82756 

20553 

58446 

55376 

88914 

75096 

26119 

83898 

43816 

65 

77585 

52593 

56612 

95766 

10019 

29531 

73064 

20953 

53523 

58136 

66 

23757 

16364 

05096 

03192 

62386 

45389 

85332 

18877 

55710 

96459 

67 

45989 

96257 

23850 

26216 

23309 

21526 

07425 

50254 

19455 

29315 

68 

92970 

94243 

07316 

41467 

64837 

52406 

25225 

51553 

31220 

14032 

69 

74346 

59596 

40088 

98176 

17896 

86900 

20249 

77753 

19099 

48885 

70 

87646 

41309 

27636 

45153 

29988 

94770 

07255 

70908 

05340 

99751 

71 

50099 

71038 

45146 

06146 

55211 

99429 

43169 

66259 

97786 

59180 

72 

10127 

46900 

64984 

75348 

04115 

33624 

68774 

60013 

35515 

62556 

73 

67995 

81977 

18984 

64091 

02785 

27762 

42529 

97144 

80407 

64524 

74 

26304 

80217 

84934 

82657 

. 69291 

35397 

98714 

35104 

08187 

48109 

75 

81994 

41070 

56642 

64091 

31229 

02595 

13513 

45148 

78722 

30144 

76 

59537 

34662 

79631 

89403 

65212 

09975 

06118 

86197 

58208 

16162 

77 

51228 

10937 

62396 

81460 

47331 

91403 

95007 

06047 

16846 

64809 

78 

31089 

37995 

29577 

07828 

42272 

54016 

21950 

86192 

99046 

84864 

79 

38207 

97938 

93459 

75174 

79460 

55436 

57206 

87644 

21296 

43393 

80 

88666 

31142 

09474 

89712 

63153 

62333 

42212 

06140 

42594 

43671 

81 

53365 

56134 

67582 

92557 

89520 

33452 

05134 

70628 

27612 

33738 

82 

89807 

74530 

38004 

90102 

11693 

90257 

05500 

79920 

62700 

43325 

83 

18682 

81038 

85662 

90915 

91631 

22223 

91588 

80774 

07716 

12548 

84 

63571 

32579 

63942 

25371 

09234 

94592 

98475 

76884 

37635 

33608 

85 

68927 

56492 

67799 

95398 

77642 

>4913 

91583 

08421 

81450 

76229 

86 

56401 

63186 

39389 

88798 

31356 

89235 

97036 

32341 

33292 

73757 

87 

24333 

95603 

02359 

72942 

46287 

95382 

08452 

62862 

97869 

71775 

88 

17025 

84202 

95199 

62272 

06366 

16175 

97577 

99304 

41587 

03686 

89 

02804 

08253 

52133 

20224 

68034 

50865 

57868 

22343 

55111 

03607 

90 

08298 

03879 

20995 

19850 

73090 

13191 

' 1 8963 

82244 

78479 

99121 

9! 

59883 

01785 

82403 

96062 

'03785 

03488 

12970 

64896 

38336 

30030 

92 

46982 

06682 

62864 

91837 

74021 

89094 

39952 

64158 

79614 

78235 

93 

31121 

47266 

07661 

02051 

67599 

24471 

69843 

83696 

71402 

76287 

94 

97867 

56641 

63416 

17577 

30161 

87320 

37752 

73276 

48969 

41915 

95 

57364 

86746 

08415 

14621 

49430 

22311 

15836 

72492 

49372 

44103 

96 

09559 

26263 

69511 

28064 

75999 

44540 

13337 

10918 

79846 

54809 

97 

53873 

55571 

00608 

42661 

91332 

63956 

74087 

59008 

47493 

99581 

98 

35531 

19162 

86406 

05299 

77511 

24311 

57257 

22826 

77555 

05941 

99 

28229 

88629 

25695 

94932 

30721 

16197 

78742 

34974 

97528 

45447 



TABLE A 2 

Ordinates of the Normal Curve 


Second decimal place in Z 

Z * - — _ 



0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.3989 

0,3989 

0.3989 

0.3988 

0.3986 

0.3984 

0.3982 

0.3980 

0.3977 

0.3973 

0.1 

.3970 

.3965 

.3961 

.3956 

.3951 

.3945 

.3939 

.3932 

.3925 

,3918 

0.2 

.3910 

.3902 

.3894 

.3885 

.3876 

.3867 

.3857 

.3847 

.3836 

.3825 

0.3 

.3814 

.3802 

.3790 

.3778 

.3765 

.3752 

.3739 

.3725 

.3712 

.3697 

0.4 

.3683 

.3668 

.3653 

.3637 

,3621 

.3605 

.3589 

.3572 

.3555 

.3538 

0.5 

.3521 

.3503 

.3485 

.3467 

.3448 

.3429 

.3410 

.3391 

.3372 

.3352 

0.6 

.3332 

,3312 

.3292 

.3271 

.3251 

.3. 30 

.3209 

.3187 

.3166 

.3144 

0.7 

.3123 

3101 

,3079 

.3056 

.3034 

.3011 

.2989 

.2966 

.2943 

.2920 

0.8 

.2897 

.2874 

.2850 

.2827 

.2803 

.2780 

.2756 

.2732 

.2709 

.2685 

0.9 

.2661 

.2637 

.2613 

.2589 

.2565 

.2541 

.2516 

.2492 

.2468 

.2444 

i.O 

.2420 

.2396 

.2371 

.2347 

.2323 

.2299 

,2275 

.2251 

.2227 

.2203 

1.1 

.2179 

.2155 

.2131 

.2107 

.2083 

.2059 

.2036 

.2012 

.1989 

.1965 

1.2 

.1942 

.1919 

.1895 

.1872 

.1849 

.1826 

.1804 

.1781 

.1758 

.1736 

1.3 

.1714 

.1691 

.1669 

.1647 

.1626 

.1604 

.1582 

.1561 

.1539 

.1518 

1.4 

.1497 

.1476 

.1456 

.1435 

.1415 

.1394 

,1374 

.1354 

.1334 

.1315 

1.5 

.1295 

.1276 

.1257 

.1238 

.1219 

.1200 

.1182 

.1163 

.1145 

.1127 

1.6 

.1109 

.1092 

.1074 

.1057 

.1040 

.1023 

im 

.0989 

,0973 

.0957 

1.7 

.0940 

.0925 

.0909 

.0893 

.0878 

.0863 

.0848 

.0833 

.0818 

.0804 

1.8 

.0790 

,0775 

.0761 

.0748 

.0734 

.0721 

.0707 

.0694 

.0681 

.0669 

1.9 

.0656 

.0644 

.0632 

.0620 

.0608 

.0596 

.0584 

.0573 

.0562 

.0551 

2.0 

.0540 

.0529 

.0519 

.0508 

.0498 

.0488 

.0478 

.0468 

.0459 

.0449 

2.1 

.0440 

.0431 

.0422 

.0413 

.0404 

.0396 

.0387 

.0379 

.0371 

.0363 

2.2 

.0355 

.0347 

.0339 

.0332 

.0325 

.0317 

.0310 

.0303 

.0297 

,0290 

2.3 

.0283 

.0277 

.0270 

.0264 

.0258 

.0252 

.0246 

.0241 

.0235 

.0229 

2.4 

.0224 

.0219 

.0213 

.0208 

.0203 

.0198 

.0194 

.0189 

.0184 

.0180 

2.5 

.0175 

.0171 

.0167 

.0163 

.0158 

.0154 

.0151 

.0147 

.0143 

.0139 

2.6 

.0136 

.0132 

.0129 

.0126 

.0122 

.0119 

.0116 

.0113 

.0110 

.0107 

2.7 

.0104 

.0101 

.0099 

.0096 

.0093 

.0091 

.0088 

.0086 

.0084 

.0081 

2.8 

.0079 

.0077 

.0075 

.0073 

.0071 

.0069 

.0067 

.0065 

.0063 

.0061 

2.9 

.0060 

.0058 

.0056 

.0055 

.0053 

.0051 

.0050 

.0048 

.0047 

.0046 

7 




First decimal place in Z 





0.0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

3 

0.0044 

0 0033 

0.0024 

0.0017 

0 0012 

0.0009 

0.0006 

0.0004 

0.0003 

0.0002 

4 

.0001 

.0001 

.0001 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 
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TABLE A 3 

Cumulative Normal Frequency Distribution 
(Area under the standard normal curve from 0 to Z) 


X 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.0000 

0.0040 

0.0080 

0.0120 

0.0160 

0.0199 

0.0239 

0.0279 

0.0319 

0.0359 

0.1 

.0398 

.0438 

.0478 

.0517 

.0557 

.0596 

.0636 

.0675 

.0714 

.0753 

0.2 

.0793 

.0832 

.0871 

.0910 

.0948 

.0987 

.1026 

.1064 

.1103 

.1141 

0.3 

.1179 

.1217 

.1255 

,1293 

.1331 

.1368 

.1406 

.1443 

.1480 

1517 

0.4 

.1554 

.1591 

.1628 

.1664 

.1700 

.1736 

.1772 

.1808 

.1844 

.1879 

0.5 

.1915 

.1950 

.1985 

.2019 

.2054 

.2088 

.2123 

.2157 

.2190 

.2224 

0.6 

.2257 

.2291 

.2324 

.2357 

.2389 

.2422 

.2454 

.2486 

.2517 

.2549 

0.7 

.2580 

.2611 

.2642 

.2673 

.2704 

.2734 

.2764 

.2794 

.2823 

.2852 

0.8 

.2881 

.2910 

.2939 

.2967 

.2995 

.3023 

.3051 

.3078 

.3106 

.3133 

0.9 

.3159 

.3186 

.3212 

.3238 

.3264 

.3289 

,3315 

.3340 

.3365 

3389 

1.0 

.3413 

.3438 

.3461 

.3485 

.3508 

.3531 

.3554 

.3577 

.3599 

.3621 

u 

.3643 

.3665 

.3686 

.3708 

.3729 

.3749 

.3770 

.3790 

.3810 

.3830 

1.2 

.3849 

.3869 

.3888 

.3907 

.3925 

.3944 

.3962 

.3980 

.3997 

.4015 

1.3 

.4032 

.4049 

.4066 

.4082 

.4099 

.4115 

.4131 

.4147 

.4162 

.4177 

1.4 

.4192 

.4207 

.4222 

.4236 

.4251 

.4265 

.4279 

.4292 

.4306 

.4319 

1.5 

.4332 

.4345 

.4357 

,4370 

.4382 

.4394 

.4406 

.4418 

.4429 

.4441 

1.6 

.4452 

.4463 

.4474 

.4484 

.4495 

.4505 

.4515 

.4525 

.4535 

.4545 

1.7 

.4554 

.4564 

.4573 

.4582 

.4591 

.4599 

.4608 

.4616 

.4625 

.4633 

1.8 

.4641 

.4649 

.4656 

.4664 

.4671 

.4678 

.4686 

.4693 

.4699 

.4706 

1.9 

.4713 

.4719 

.4726 

.4732 

.4738 

.4744 

.4750 

.4756 

.4761 

.4767 

2.0 

.4772 

.4778 

.4783 

.4788 

.4793 

.4798 

.4803 

.4808 

.4812 

.4817 

2.1 

,4821 

.4826 

4830 

.4834 

.4838 

.4842 

.4846 

,4850 

.4854 

.4857 

2.2 

.4861 

.4864 

.4868 

4871 

.4875 

4878 

.4881 

.4884 

.4887 

.4890 

23 

.4893 

.4896 

.4898 

4901 

.4904 

.4906 

.4909 

.4911 

.4913 

.4916 

2.4 

.4918 

.4920 

4922 

.4925 

.4927 

.4929 

.4931 

.4932 

.4934 

.4936 

2.5 

4938 

.4940 

4941 

.4943 

.4945 

.4946 

.4948 

.4949 

.4951 

.4952 

2.6 

4953 

4955 

.4956 

.4957 

.4959 

.4960 

.4961 

.4962 

4963 

.4964 

2.7 

4965 

.4966 

.4967 

.4968 

.4969 

.4970 

.4971 

.4972 

.4973 

.4974 

2 8 

.4974 

.4975 

.4976 

4977 

.4977 

.4978 

.4979 

.4979 

.4980 

.4981 

2.9 

.4981 

.4982 

.4982 

4983 

.4984 

.4984 

.4985 

.4985 

.4986 

.4986 

3.0 

.4987 

.4987 

4987 

.4988 

.4988 

4989 

.4989 

.4989 

.4990 

.4990 

3.1 

.4990 

.4991 

4991 

4991 

.4992 

4992 

.4992 

.4992 

4993 

.4993 

3.2 

4993 

.4993 

4994 

4994 

.4994 

4994 

.4994 

.4995 

4995 

.4995 

3.3 

.4995 

.4995 

4995 

.4996 

4996 

4996 

.4996 

.4996 

.4996 

.4997 

3:4 

.4997 

.4997 

4997 

.4997 

4997 

4997 

.4997 

.4997 

.4997 

.4998 

3.6 

39 

.4998 

5000 

.4998 

,4999 

.4999 

.4999 

.4999 

.4999 

.4999 

.4999 

.4999 
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TABLE A 4 

Thl Disiribuuon ot t* (Two-tailed Tests) 


Degrees 

of 



Probability of a 

Larger Value, Sign Ignored 



Freedom 

0.500 

0.400 

0.200 

0.100 | 

0.050 

0.025 

0.010 

0.005 

0.001 

1 

1. 000 

1.376 

3.078 

6.314 

12.706 

25.452 

63.657 



2 

0.816 

1.061 

1.886 

2.920 

4.303 

6.205 

9.925 

14.089 

31.598 

3 

.765 

0.978 

1.638 

2.353 

3.182 

4.176 

5.841 

7.453 

12.941 

4 

.741 

.941 

1.533 

2.132 

2.776 

3.495 

4.604 

5.598 

8.610 

5 

.727 

.920 

1.476 

2.015 

2.571 

3.163 

4.032 

4.773 

6.859 

6 

.718 

.906 

1.440 

1.943 

2.447 

2.969 

3.707 

4.317 

5.959 

7 

.711 

.896 

1.415 

1.895 

2.365 

2.841 

3.499 

4.029 

5.405 

8 

.706 

.889 

1.397 

1.860 

2.306 

2.752 

3.355 

3.832 

5.041 

9 

.703 

.883 

1.383 

1.833 

2.262 

2.685 

3.250 

3.690 

4.781 

10 

.700 

.879 

1.372 

1.812 

2.228 

2.634 

3.169 

3.581 

4.587 

11 

.697 

.876 

1.363 

1.796 

2.201 

2.593 

3.106 

3.497 

4.437 

12 

.695 

.873 

1.356 

1.782 

2.179 

2.560 

3.055 

3.428 

4.318 

13 

.694 

.870 

1.350 

1.771 

2 160 

2.533 

3.012 

3,372 

4.221 

14 

.692 

.868 

1.345 

1.761 

2 145 

2.510 

2.977 

3.326 

4.140 

15 

.691 

.866 

1.341 

1.753 

2 131 

2.490 

2.947 

3.286 

4.073 

16 

.690 

.865 

1.337 

1.746 

2 120 

2.473 

2.921 

3.252 

4.015 

17 

.689 

.863 

1.333 

1.740 

2 110 

2.458 

2.898 

3.222 

3.965 

’18 

.688 

.862 

1.330 

1.734 

2 101 

2,445 

2.878 

3.197 

3.922 

19 

.688 

.861 

1.328 

1 729 

2.093 

2.433 

2.861 

3.174 

3.883 

20 

.687 

.860 

1.325 

1 725 

2.086 

2.423 

2.845 

3.153 

3.850 

21 

.686 

.859 

1.323 

1.721 

2 080 

2.414 

2.831 

3.135 

3.819 

22 

.686 

.858 

1.321 

1.717 

2 074 

2.406 

2.819 

3.119 

3.792 

23 

.685 

.858 

1.319 

1.714 

2 069 

2.398 

2.807 

3.104 

3.767 

24 

.685 

.857 

1.318 

1.711 

2 064 

2.391 

l 2.797 

3.090 

3.745 

25 

.684 

.856 

1.316 

1.708 

2 060 

2.385 

2.787 

i 

3.078 

3.725 

26 

.684 

.856 

1.315 

: 1.706 

2.056 

2.379 

i 2.779 

! 3.067 

3.707 

27 

.684 

.855 

1.314 

! 1.703 

2.052 

2.373 

2.771 

1 3.056 ! 

3.690 

28 

.683 

.855 

1.313 

1.701 

2.048 

2.368 

2.763 

3.047 

! 3.674 

29 

.683 

.854 

1.311 

1.699 

2.045 

2.364 

2.756 

3.038 

i 3.659 

30 

.683 

.854 

1.310 

1.697 

2.042 

2.360 

2.750 

3.030 

! 3.646 

! 

35 

: .682 

.852 

1.306 

1.690 

2.030 

2.342 

2.724 

2.996 | 

i 3.591 

40 

.681 

i .851 

1.303 

1.684 

2.021 

2.329 

2.704 

2.971 j 

3.551 

45 

.680 

i .850 

1 1.301 

1.680 

2.014 

2.319 

2.690 

2.952 

3.520 

50 

.680 

- ,849 

; 1.299 

1.676 

2.008 

2.310 

2.678 

2.937 ; 

3.496 

55 

.679 

| .849 

i 1.297 

1.673 

2.004 

2.304 

2.669 

2.925 ! 

1 3.476 

60 

.679 1 

1 .848 

1.296 

1.671 

2.000 

2.299 

2.660 

2.915 

| 3.460 

70 

.678 

1 .847 

1.294 

1.667 

1.994 

2.290 

2.648 

2.899 ; 

| 3.435 

80 

.678 

.847 

1 1.293 

1.665 

1.989 

2.284 

2.638 

2.887 

! 3.416 

90 

.678 

.846 

; 1.291 

1.662 

1.986 

2.279 

2.631 

2.878 

3.402 

100 

.677 

.846 

1.290 

1.661 

1.982 

2.276 

2.625 

2.871 

3.390 

120 

.677 

.845 

1.289 

1.658 

1.980 

2.270 

2.617 

2.860 

3.373 

oc 

.6745 

.8416 

1.2816 

1.6448 

1.9600 

2.2414 

2.5758 

2.8070 

3.2905 


* Parts of this table are reprinted by permission from R, A. Fisher’s Statistical Methods 
for Research Workers, published by Oliver and Boyd, Edinburgh (1925-1950); from Maxine 
Mernngton's "Table of Percentage Points of the r-Distribution,” Biometrika, 32: 300 (1942) ; 
and from Bernard Ostle's Statistics in Research , Iowa State University Press (1954). 
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* Condensed from table with 6 significant figures by Catherine M. Thompson, by permission of the Editor of Biometrika. 
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& TABLE A 6 

(i) Table for Testing Skewness 

(One-tailed percentage points of the distribution of fb , ~ g 2 = mfm 2 il2 )* 


Size ol 
Sample 
n 

- "H 

Percentage Points ! 

Standard 

Deviation 

Size of 
Sample 
n 

i 

Percentage Points 

Standard 

Deviation 

5% 


5% 

i% i 

25 

0.711 

L061 

0.4354 

100 

0.389 

0.567 

0.2377 

30 

0.662 

0.986 

.4052 

125 

0.350 

0.508 

.2139 

35 

0.621 

0.923 

.3804 

150 

0.321 

0.464 1 

1 .1961 

40 

0.587 

0 870 

.3596 

175 

0.298 

0.430 | 

.1820 

45 

0.558 

0.825 

.3418 

200 

0.280 

0.403 i 

i .1706 

50 

0.534 

0.787 

.3264 









250 

0.251 

0.360 

.1531 

60 

0.492 

0.723 

.3009 

300 

0.230 

0.329 

.1400 

70 

0.459 

0.673 

.2806 

350 

0.213 

0.305 

.1298 . 

80 

0.432 

0.631 

.2638 

400 

0.200 

0.285 1 

.1216 

90 

0.409 

0.596 

.2498 

450 

0.188 

0.269 1 

1 .1147 

100 

0.389 

0.567 

.2377 

500 

— -i 

0.179 

0.255 1 

.1089 


* Since the distribution of fb x is symmetrical about zero, the percentage points repre- 
sent 10% and 2% two-tailed values. Reproduced from Table 34 B of Tables for Statisticians 
and Biometricians , Vol. 1, by permission of Dr. E. S. Pearson and the Biometrika Trustees. 


TABLE A 6 — ( Continued) 

(n) Table for Testing Kurtosis 
(Percentage points of the distribution of b 2 ~ mjm 2 2 )* 


Size of 
Sample 
n 

! Percentage Points 

L_ 

Size of 
Sample 
n 

Percentage Points 

Upper 

1% 

Upper 

5% 

Lower 

5% 

Lower 

i% 

Upper Upper 
1% 5% 

Lower 

5% 

Lower 

1% 

50 

4.88 

3.99 

2.15 

1.95 

600 

3.54 

3.34 

2.70 

2.60 

75 

4.59 

3.87 

12.27 

2.08 

650 

3.52 

3.33 

2.71 

2.61 

100 

4.39 

3.77 

2.35 

2.18 

700 

3.50 

3.31 

2.72 

2.62 

125 

4.24 

3.71 

2.40 

2.24 

750 

3.48 

3.30 

2.73 

2.64 

150 

4.13 

3.65 

2.45 

2.29 

800 

3.46 

3.29 

2.74 

2.65 






850 

3.45 

3.28 

2.74 

2.66 

200 

3.98 

3.57 

2 51 

2.37 

900 

3.43 

3.28 

2.75 

2.66 

250 

3.87 

3.52 

2.55 

2.42 

950 

3.42 

3.27 

2.76 

2.67 

300 

3.79 

3.47 

2 59 

2.46 

1000 

3.41 

3 26 

2 76 

2.68 

350 

3.72 

344 

2 62 

2.50 






400 

3.67 

3.41 

2 64 

2.52 

1200 

3.37 

3.24 

2.78 

2.71 

450 

3.63 

3.39 

2 66 

2.55 

1400 

3 34 

3.22 

2.80 

2.72 

500 

3.60 

3.37 

2 67 

2 57 

1600 1 

3 32 

3.21 

2.81 

2.74 

550 

3.57 

3.35 

2.69 

2.58 

1800 

3.30 

3.20 

2.82 

2.76 

600 

3 54 

3 34 

1 

2 70 

2.60 

U 

2000 

3.28 

3.18 

2.83 

2.77 


* Reproduced from Table 34 C of Tables for Statisticians and Biometricians , by permis- 
sion of Dr. E S. Pearson and the Biometrika Trustees. 
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TABLE A 7 

(0 Significance Levels of t w = (X - /i)/w in Normal Samples. Two-tailed Test 
Divide P by 2 for a One-tailed Test* 


Size of 
Sample 

0.10 

Probability P 

0.05 0.02 

0.01 

2 

3.157 

6.353 

15.910 

31 828 

3 

0 885 

1.304 

2.111 

3.008 

4 

.529 

0.717 

1,023 

1.316 

5 

.388 

.507 

0.685 

0.843 

6 

.312 

.399 

.523 

.628 

7 

.263 

.333 

.429 

507 

8 

.230 

.288 

.366 

429 

9 

.205 

.255 

322 

374 

10 

.186 

.230 

.288 

333 

11 

.170 

.210 

.262 

302 

12 

.158 

.194 

.241 

.277 

13 

.147 

.181 

.224 

.256 

14 

.138 

.170 

.209 

.239 

15 

.131 

.160 

.197 

224 

16 

.124 

.151 

.186 

212 

17 

.118 

.144 

.177 

.201 

18 

.113 

.137 

.168 

.191 

19 

.108 

.131 

.161 

.182 

20 

.104 

.126 

.154 

175 


* Taken from more extensive tables by permission of E. Lord and the Editor of Bio- 
metriica. 


(Table A 7 continued overleaf ) 
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TABLE_A 7— (Continued) 

(ii) Significance Levels of (\\ - X ' 2 )/%(^\ + w i ) FOR Tw0 Normal 
Samples of Equal Sizes.* Two-tailed Test 


Size of 
Sample 

0.10 

] 

0.05 

Probability P 

0.02 

0.01 

2 

2.322 

3.427 

5.553 

7.916 

3 

0.974 

1.272 

1.715 

2.093 

4 

.644 

0.813 

1.047 

1.237 

5 

.493 

.613 

0.772 

0.896 

6 

.405 

.499 

.621 

.714 

7 

.347 

.426 

.525 

.600 

8 

.306 

.373 

.459 

.521 

9 

.275 

.334 

.409 

.464 

10 

.250 

.304 

.371 

.419 

11 

.233 

.280 

.340 

.384 

12 

1 .214 

.260 

.315 

.355 

13 

.201 

.243 

.294 

.331 

14 

.189 

.228 

.276 

.311 

15 

.179 

.216 

.261 

.293 

16 

.170 

.205 

.247 

.278 

17 

,162 

.195 

.236 

.264 

18 

.155 

.187 

.225 

.252 

19 

.149 

.179 

.216 

.242 

20 

.143 

.172 

.207 

.232 


* From more extensive tables by permission of E. Lord and the Editor of Biometrika. 


TABLE A 8 

Numbers of Like Signs Required for Significance in the Sign Test, 
With Actual Significance Probabilities. Two-tailed Test 


tt 


No. of 

! Significance Level 

| No. of 

Significance Level 

Pairs 

1 1 % 

5 0 / 

10 °; 

j Pairs 

1 % 

5 % 

10 % 

5 



00062) ' 

13 

10003) 

20022) 

30092) 

6 


00031) 

00031) 

14 

10002) 

20013' 

30057) 

7 


0001 6) 

00016) 

15 

20007) 

30035) 

30035) 

8 

0(.0Q8) 

00008) 

10070) 

16 

20004) 

30021) 

40077) 

9 

0(.004) 

10039) 

10039) 

I 17 

20002) 

40049) 

40049) 

10 

0(.002) 

10021) 

10021) 

! 18 

30008) 

40031) 

50096) 

11 ; 

0(.001) 

10012) 

20065) 

! 19 

30004) 

40019) 

50063) 

12 

10006) 

20039) 

20039) 

20 

30003) 

50041) 

50041) 
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TABLE A 9 

Sum of Ranks at Approximate 5% and 1% Levels of P* These Numbers 
or Smaller Indicate Rejection. Two-tailed Test 


Number of Pairs 


5% Level 


1% Level 


1 

2(0.047) 

0(0.016) 

8 

2(0.024) 

0(0.008) 

9 

6(0.054) 

2(0.009) 

10 

8(0.049) 

3(0.010) 

11 

11(0.053) 

5(0.009) 

12 

14(0.054) 

7(0.009) 

13 

17(0.050) 

10(0.010) 

14 

21(0.054) 

13(0.011) 

15 

25(0.054) 

16(0.010) 

16 

29(0.053) 

19(0.009) 


* The figures in parentheses are the actual significance probabilities. Adapted from 
the article by Wilcoxon (2, Chapter 5). 


TABLE A 10 

Wilcoxon’s Two-Sample Rank Test (The Mann-Whitney Test). 
Values of T at Two Levels 

(These values or smaller cause rejection. Two-tailed test. Take n x <, n 2 *) 


0.05 Level of T 
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TABLE A 10 — (Continued) 


0 01 Level ofT 



* n l and n 2 are the numbers of cases in the two groups If the groups are unequal in 
size, n x refers to the smaller 

Table is reprinted from White (12, Chapter 5) who extended the method of Wilcoxon 
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TABLE A 11 

Correlation Coefficients at the 5% and 1% Levels of Significance 


Degrees of 
Freedom 

5% 


Degrees of 
Freedom 

5°4 


1 

997 

1 000 

24 

388 

4% 

2 

950 

990 

25 

381 

487 

3 

878 

959 

26 

374 

478 

4 

811 

917 

27 

367 

470 

5 

754 

874 

28 

361 

463 

6 

707 

834 

29 

355 

456 

7 

666 

798 

30 

349 

449 

8 

632 

765 1 

35 

325 

418 

9 

602 

735 

40 

304 

393 

10 

576 

708 

45 

288 

372 

11 

553 

684 

50 

273 

354 

12 

532 

661 

60 

250 

325 

13 

514 

641 

70 

232 

302 

14 

497 

623 

80 

217 

283 

15 

482 

606 

90 

205 

267 

16 

468 

590 

100 

195 

254 

17 

456 

575 

125 

174 

228 

18 

444 

561 

150 

159 

208 

19 

433 

549 

200 

138 

181 

20 

423 

537 

300 

113 

148 

21 

413 

526 

400 

098 

128 

22 

404 

515 

500 

088 

115 

23 

396 

505 

1,000 i 

062 

081 


Portions of this table were taken from Table VA m Statistical Methods for Research 
Workers by permission of Protessor R A Fisher and his publishers, Oliver and Boyd 
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TABLE A 12 

Table of z = \ LOG e (1 + 0(1 - r) to Transform the Correlation Coefficient 


r 

1 0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

.0 

0.000 

0.010 

0.020 

0.030 

0.040 

0.050 

0,060 

0.070 

0,080 

0.090 

.1 

.100 

.110 

.121 

.131 

.141 

.151 

.161 

.172 

.182 

.192 

.2 

.203 

.213 

.224 

.234 

.245 

.255 

.266 

.277 

.288 

.299 

.3 

.310 

.321 

.332 

.343 

.354 

.365 

.377 

.388 

.400 

.412 

.4 

.424 

.436 

.448 

.460 

.472 

.485 

.497 

.510 

.523 

.536 

.5 

.549 

.563 

.576 

.590 

.604 

.618 

.633 

.648 

.662 

.678 

.6 

.693 

.709 

.725 

.741 

.758 

.775 

.793 

.811 

.829 

.848 

.7 

.867 

.887 

.908 

.929 

.950 

.973 

.996 

1.020 

1.045 

1.071 

.8 

1.099 

1.127 

1.157 

1.188 

1.221 

1.256 

1.293 

1.333 

1.376 

1.422 

r 

0.000 

0.001 

0.002 

0.003 

0.004 

0.005 

0.006 

0.007 

0.008 

0.009 

.90 

1.472 

1.478 

1.483 

1.488 

1.494 

1.499 

1.505 

1.510 

1.516 

1.522 

.91 1 

1.528 

1.533 

1.539 

1.545 

1.551 

1.557 

1.564 

1.570 

1.576 

1.583 

.92 

1.589 

1.596 

1.602 

1.609 

1.616 

1.623 

1.630 

1.637 

1.644 

1.651 

.93 

1.658 

1.666 

1.673 

1.681 

1.689 

1.697 

1.705 

1.713 

1.721 

1.730 

.94 

1.738 

1.747 

1.756 

1.764 

1.774 

1.783 

1.792 

1.802 

1.812 

1.822 

.95 

1.832 

1.842 

1.853 

1.863 

1.874 

1.886 

1.897 

1.909 

1.921 

1.933 

.96 

1.946 

1.959 

1.972 

1.986 

2.000 

2.014 

2.029 

2.044 

2.060 

2.076 

.97 

2.092 

2.109 

2.127 

2.146 

2.165 

2.185 

2.205 

2.227 

2.249 

2.273 

.98 , 

2.298 

2.323 

2.351 

2.380 

2.410 

2.443 

2.477 

2.515 

2.555 

2.599 

.99 

2.646 

2.700 

2.759 

2.826 

2.903 

2.994 

3 106 

3.250 

3.453 

3.800 
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TABLE A 13 
Table of r in Terms of :* 

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 

0.0 0.000 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080 0.090 

.1 .100 .110 .119 .129 .139 .149 .159 .168 .178 .187 

.2 .197 .207 .216 .226 .236 .245 .254 .264 .273 .282 

.3 .291 .300 .310 .319 .327 .336 .345 .354 .363 .371 

.4 .380 .389 .397 .405 .414 .422 .430 .438 .446 .454 

.462 .470 .478 .485 .493 .500 .508 .515 .523 . 530 

.537 .544 .551 .558 .565 .572 .578 .585 .592 .598 

.604 .611 .617 .623 .629 .635 .641 .647 .653 .658 

.664 .670 .675 .680 .686 .691 .696 . 701 .706 711 

.716 .721 .726 .731 .735 .740 .744 74 753 .75" 

.762 .766 .770 .774 .778 .782 .786 .790 793 797 

.800 .804 .808 .811 .814 .818 .821 .824 .828 .831 

.834 .837 .840 .843 .846 .848 .851 .854 .856 .859 

.862 .864 .867 .869 .872 .874 .876 .879 .881 .883 

.885 .888 .890 .892 .894 .896 .898 .900 .902 .903 

1 .905 .907 .909 .910 .912 .914 .915 .917 .919 .920 

I .922 .923 .925 .926 .928 .929 .930 .932 .933 .934 

.935 .937 .938 .939 .940 .941 .942 .944 .945 ,946 

947 .948 .949 .950 .951 .952 .953 .954 .954 .955 

.956 .957 .958 .959 .960 .960 .961 .962 .963 .963 

.964 .965 .965 .966 .967 967 .968 .969 .969 .970 

! .970 .971 .972 .972 .973 .973 .974 .974 .975 .975 

.976 .976 .977 .977 .978 .978 .978 .979 .979 .980 

1 980 .980 .981 981 .982 .982 .982 .983 .983 .983 

I .984 .984 984 .985 .985 .985 .986 .986 .986 .986 

2.5 | .987 .987 .987 .987 .988 .988 .988 .988 .989 .989 

26 I 989 989 .989 .990 .990 .990 .990 .990 .991 .991 

2 7 ! 991 991 .991 .992 .992 .992 .992 .992 .992 .992 

2.8 ! .993 .993 .993 .993 .993 .993 993 .994 .994 .994 

2.9 .994 .994 .994 .994 .994 .995 995 .995 .995 995 


r = (e 2z - \)/(e 2z + 1). 



TABLE A 14, Part I 

5% (Roman Type) and 1% (Bold Face Type) Points for the Distribution of F 
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TABLE A 16 

Angles Corresponding to Percentages, Angle = Arcsin^'Rrcentage, 
as Given by C. I. Bliss* 
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25.25 

25.33 

25.40 

25.48 

25.55 

25.62 

25.70 

25.77 

• 19 
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31.05 

31 1 1 

31.18 
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27 

31.31 

31.37 

31.44 

31.50 

31.56 

31.63 

31.69 . 

. 31.76 

31.82 

31.88 

28 

31.95 

32.01 

32.08 

32.14 

32.20 

32.27 

32.33 

32.39 

32.46 

32.52 

29 

32.58 

32.65 

32.71 

32.77 

32.83 

32.90 

32.96 

33.02 

33.09 

33.15 


* We are indebted to Dr. C. I. Bliss for permission to reproduce this table, which 
appeared in Plant Protection , No. 12, Leningrad (1937). 

( Table A 16 continued on pp. 570-71) 
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TABLE A 1 6 — (Continued) 


0/ 

/o 
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3 

4 
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7 

8 
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35 
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39 
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39.17 

40 

39.23 
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39.35 

39.41 

39.47 

39.52 

39.58 

39.64 

39.70 

39.76 

41 

39.82 

39.87 

39.93 

39.99 

40.05 

40.11 

40.16 

40.22 

40.28 

40.34 

42 

40.40 

40.46 

40.51 

40.57 

40.63 

40.69 

40.74 

40.80 

40.86 

40.92 

43 

40.98 

41.03 

41.09 

41.15 

41.21 

41.27 

41.32 

41.38 

41.44 

41.50 

44 

41.55 

41.61 

41.67 

41.73 

41.78 

41.84 

41.90 

41.96 

42.02 

42.07 

45 

42.13 

42.19 

42.25 

42.30 

42.36 

42,42 

42.48 

42.53 

42.59 

42.65 

46 

42.71 

42.76 

42.82 

42.88 

42.94 

42.99 

43.05 

43.11 

43.17 

43.22 

47 

43.28 

43.34 

43.39 

43.45 

43.51 

43.57 

43,62 

43.68 

43.74 

43.80 

48 

43.85 

43.91 

43.97 

44.03 

44.08 

44.14 

44.20 

44.25 

44.31 

44.37 

49 

44.43 

44.48 

44.54 

44.60 

44.66 

44.71 

44.77 

44.83 

44.89 

44.94 

50 

45.00 

45.06 

45.11 

45.17 

45.23 

45.29 

45.34 

45.40 

45.46 

45.52 

51 

45.57 

45.63 

45.69 

45.75 

45.80 

45.86 

45.92 

45.97 

46.03 

46.09 

52 

46.15 

46.20 

46.26 

46.32 

46.38 

46.43 

46.49 

46.55 

46.61 

46.66 

53 

46.72 

46.78 

46.83 

46.89 

46.95 

47.01 

47.06 

47.12 

47.18 

47.24 

54 

47.29 

47.35 

47.41 

47.47 

47.52 

47.58 

47.64 

47.70 

47.75 

47.81 

55 

47.87 

47.93 

47.98 

48.04 

48.10 

48.16 

48.22 

48.27 

48.33 

48.39 

56 

48.45 

48.50 

48.56 

48.62 

48.68 

48.73 

48.79 

48.85 

48.91 

48 97 

57 

49.02 

49.08 

49.14 

49.20 

49.26 

49.31 

49.37 

49.43 

49.49 

49.54 

58 

49.60 

49.66 

49.72 

49.78 

49.84 

49.89 

49.95 

50 01 

50.07 

50.13 

59 

50.18 

50.24 

50.30 

50.36 

50.42 

50.48 

50.53 

50.59 

50.65 

50.71 

60 

50.77 

50.83 

50.89 

50.94 

51.00 

51.06 

51.12 

51.18 

51.24 

51.30 

61 

51.35 

51.41 

51.47 

51.53 

51.59 

51.65 

51.71 

51.77 

51.83 

51.88 

62 

51.94 

52.00 

52.06 

52.12 

52.18 

52.24 

52.30 

52.36 

52.42 

52.48 

63 

52.53 

52.59 

52.65 

52.71 

52.77 

52.83 

52.89 

52.95 

53 01 

53.07 

64 

53.13 

53.19 

53.25 

53.31 

53.37 

53.43 

53.49 

53.55 

53.61 

53.67 

65 

53.73 

53.79 

53.85 

53.91 

53.97 

54.03 

54.09 

54.15 

54.21 

54 27 

66 

54.33 

54.39 

54.45 

54.51 

54.57 

54.63 

54.70 

54.76 

54.82 

54 88 

67 

54.94 

55.00 

55.06 

55.12 

55.18 

55.24 

55.30 

55.37 

55.43 

55 49 

68 

55.55 

55.61 

55.67 

55 73 

55.80 

55.86 

55.92 

55.98 

56.04 

5 1 11 

69 

56.17 

56.23 

56.29 

56.35 

56.42 

56.48 

56.54 

56.60 

56.66 

56 73 
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TABLE A 16 — ( Continued ) 


% 

0 

1 

2 

3 

4 

5 

70 

56.79 

56.85 

56.91 

56.98 

57.04 

57.10 

71 

57.42 

57.48 

57.54 

57.61 

57.67 

57.73 

72 

58.05 

58.12 

58.18 

58.24 

58.31 

58.37 

73 

58.69 

58.76 

58.82 

58.89 

58.95 

59.02 

74 

59.34 

59.41 

59.47 

59.54 

59.60 

59.67 

75 

60.00 

60.07 

60.13 

60.20 

60.27 

60.33 

76 

60.67 

60.73 

60.80 

60.87 

60.94 

61.00 

77 

61.34 

61.41 

61.48 

61.55 

61.62 

61.68 

78 

62.03 

62.10 

62.17 

62.24 

62.31 

62.37 

79 

62.72 

62.80 ‘ 

62.87 

62.94 

63.01 

63.08 

80 

63.44 

63.51 

63.58 

63.65 

63.72 

63.79 

81 

64.16 

64.23 

64.30 

64.38 

64.45 

64.52 

82 

64.90 

64.97 

65.05 

65.12 

65.20 

65.27 

83 

65.65 

65.73 

65.80 

65.88 

65.96 

66.03 

84 

66.42 

66.50 

66,58 

66 . 66 - 

66.74 

66.81 

85 

67.21 

67.29 

67.37 

67.45 

67.54 

67.62 

86 

68.03 

68.11 . 

68.19 

68.28 

68.36 

68.44 

87 

68.87 

68.95 

69.04 

69.12 

69.21 

69.30 

88 

69.73 

69.82 

69.91 

70.00 

70.09 

70.18 

89 

70.63 

70.72 

70.81 

70.91 

71.00 

71.09 

90 

71.56 

71.66 

71.76 

71.85 

71.95 

72.05 

91 

• 72.54 

72.64 

72.74 

72.84 

72,95 

73.05 

92 

73.57 

73.68 

73.78 

73.89 

74.00 

74.11 

93 

74.66 

74.77 

74.88 

75.00 

75.11 

75.23 

94 

75.82 

75.94 

76.06 

76.19 

76.31 

76.44 

95 

77.08 

77.21 

77.34 

77.48 

77.61 

77.75 

96 

78.46 

78.61 

78.76 

78,91 

79.06 

79.22 

97 

80.02 

80.19 

80.37 

80.54 

80.72 

80.90 

98 

81.87 

82.08 

82.29 

82.51 

82.73 

82.96 

99.0 

84.26 

84.29 

84.32 

84.35 

84.38 

84,41 

99.1 

84 56 

84.59 

84.62 

84.65 

84.68 

84.71 

99.2 

84.87 

84.90 

84.93 

84.97 

85.00 

85.03 

99.3 

85.20 

85.24 

85.27 

85.31 

85.34 

85.38 

99.4 

85.56 

85.60 

85.63 

85.67 

85.71 

85.75 

99.5 

85.95 

85.99 

86.03 

86.07 

86.11 

86.15 

99.6 

86.37 

86.42 

86.47 

* 86.51 

86.56 

86.61 

99.7 

86.86 

86.91 

86.97 

87.02 

87.08 

87.13 

99.8 

87.44 

87.50 

87.57 

87.64 

87.71 

87.78 

99.9 

88.19 

88.28 

88.38 

88.48 

88.60 

88.72 

100.0 

90.00 







6 


7 8 9 


57.17 

57.80 

58.44 

59.08 

59.74 

60.40 

61.07 

61.75 

62.44 

63.15 

63.87 

64.60 

65.35 

66.11 

66.89 

67.70 

68.53 

69.38 

70.27 

71.19 

72.15 

73.15 
74.21 

75.35 

76.56 

77.89 
79.37 

81,09 

83.20 

84.44 

84.74 

85.07 

85.41 

85.79 

86.20 

86.66 

87.19 

87.86 
88.85 


57.23 

57.86 

59.50 

59.15 

59.80 

60.47 
61.14 

61.82 

62.51 
63.22 

63.94 

64.67 

65.42 

66.19 

66.97 

67.78 

68.61 

69.47 

70.36 

71.28 

72.24 

73.26 

74.32 

75.46 

76.69 

78.03 

79.53 

81.28 

83.45 

84.47 
84.77 

85,10 

85.45 

85.83 

86.24 

86.71 

87.25 
87.93 
89.01 


57.35 
57.99 

58.63 

59.28 

59.93 

60.60 

61.27 

61.96 

62.65 

63.36 

64,08 

64.82 

65.57 

66.34 

67.13 

67.94 
68,78 

69.64 

70.54 

71,47 

72.44 

73.46 

74.55 

75.70 

76.95 

78.32 

79.86 

81.67 

83.98 

84.53 

84,84 

85.17 

85.52 

85.91 

86.33 

86.81 

87.37 

88.10 

89.43 


57.29 
57.92 

58.56 

59.12 

59.87 

60.53 

61.21 

61.89 

62.58 

63.29 

64.01 

64.75 

65.50 

66.27 
67.05 

67.86 

68.70 

69.56 
70.45 

71.37 

72.34 

73.36 

74.44 

75.58 

76.82 

78.17 

79.69 

81.47 

83.71 

84.50 
84.80 

85.13 

85.48 

85.87 

86.28 

86.76 
87.31 
88.01 

89.19 
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Orthogonal Polynomials 
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Table of Square Roots 
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of two proportions in independent sam- 
ples, 215-223 

of two proportions in paired samples, 
213-215 
orthogonal, 309 

rule for standard error, 269, 301-302 
Components of variance, 280 
in factorial experiments, 364-369 
in three-stage sampling, 285-288, 291-294 
in two-stage sampling, 280-285, 289-291, 
529-533 

confidence limits, 284-285 


Compound interest law, 447 
Confidence intervals, 5-7, 14-15, 29 
for an individual Y , given X , 155-157 
for binomial proportion, 210-211 
for components of variance, 284-285 
for correlation coefficient, 1 85-188 
for partial regression coefficients, 391 
for population mean (<r known), 56 
for population mean (<x unknown), 61 , 1 22 
for population median, 124-125 
for population regression line, 153-1 55 
for population variance, 74-76 
for ratio of two variances, 197 
for slope in regression, 1 53 
one-sided, or one-tailed, 57 
table for binomial distribution, 6-7 
upper and lower, 58 

Confidence limits, 5-7. See also Confidence 
intervals. 

upper and lower, 58 
Contingency table 
R x C, 250-252 
2 x C, 238-243 
2x2, 215-223 
sets of 2 x 2 tables, 253-256 
Continuity correction, 125, 209-210, 230- 
231 

Continuous distribution, 23 
Correction 

for continuity, 125, 209-210 
for finite size of population, 513 
for mean, 261-262 
for working mean, 47-48 
Sheppard’s, 83 
Correlation 

and common elements, 181-183 
calculation in large sample, 190-193 
coefficient, 172 

combination of separate estimates, 187 
comparison of several coefficients, 186 
confidence interval for, 185 
tables, 557-559 
tests of significance, 184-188 
intraclass, 294 
multiple, 402 
nonsense, 189 
partial, 400-401 
rank, 193-195 

relation to bivariate normal distribution, 
177-179 

relation to regression, 175-177 
role in selection, 189 
role in stream extension, 189 
utility of, 188-190 

Covariance, 181. See also Analysis of co- 
variance. 

Curve fitting, 447-471 
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Degrees of freedom, 45 
for chi-square 

in contingency tables, 217, 239, 251 
m goodness of fit tests, 237 
in tests of homogeneity of variance, 297 
in analysis of variance 
Latin square, 314 
one-way classification, 261 
two-way classification, 301, 307 
m correlation, 184 
in regression, 138, 145, 162-163, 385 
Deletion of a variable, 412 
Dependent variable in regression, 135 
Design of investigations 
comparison of paired and independent 
samples, 106-109 
efficiency of blocking, 311-312 
factorial experiments, 339-364 
independent samples, 91, 100-106, 1 14— 
116, 258-275 
Latin squares, 312-317 
Missing data, 317-321 
paired samples, 91-99 
perennial crops, 377-379 
randomized blocks or groups, 299-310 
role of randomization, 109-1 1 1 
sample size^ 111-114, 221-223 
sample surveys, 504 
series of experiments, 375-377 
two-stage (split-plot or nested) designs, 
369-375 

use of covariance, 419-432 
use of regression, 1 35 
Deviations 

from sample mean, 42 
Digits 
random, 12 
table of, 543-546 
Discrete distribution, 16 
Discriminant function, 414 
computations, 416-418 
relation to multiple regression, 416 
uses, 414 

Distance between populations, 415 
Distribution. See also the specific distribu- 
tion. 

binomial, 17 
bivariate normal, 177 
chi-square, 73 
F (variance ratio), 117 
multinomial, 235 
normal^ 32 
Poisson, 223 
Student’s 59 
Dummy variable, 416 

Efficiency of 


analysis of covariance, 423-424, 427 

Latm squares, 316 

randomized blocks, 311 

range, 46 

rank tests, 132 

sign test, 127 

Equally likely outcomes, 199 
Error 

of first kind (Type I), 27, 31 
of measurement 

effect on estimates in regression, 164- 
166 

of second kind, (Type II), 27, 31 
regression, 421 

standard (See Standard error.) 

Estimate or estimator 
interval, 5, 29 
point, 5, 29 
unbiased, 45, 506 

Expected numbers, 20, 216, 228-240 
minimum size for x z tests, 235, 241 
Experiment. See Design of investigations. 
Experimental sampling, used to illustrate 
binomial confidence limits, 14 
binomial frequency distribution, 16 
central limit theorem, 51-55 
chi-square (1 d.f. \) for binomial, 22-26 
confidence interval for population mean 
/i, 78-79 

distribution of sample means from a nor- 
mal distribution, 70-72 
distribution of sample standard deviation 
s, 72-73 

distribution of sample variance s 2 , 72-73 
/'-distribution, 266 
/-distribution, 77-78 
Exponential 
decay curve, 447 
growth curve, 447, 449-453 
Extrapolation, 144, 456 


/-distribution, 117 
effect of correlated errors, 323 
effect of heterogeneous errors, 324 
effect of non-normality, 325 
one-tailed tables, 560-567 
two-tailed table, 117 
Factor, 339 

Factorial experiment, 339 
analysis of 2 2 factorial 
interaction absent, 342-344 
interaction present, 344-346 
analysis of 2 2 factorial, 359-361 
analysis of general three-factor experi- 
ment, 361-364 

analysis of general two-factor experiment, / 
346-349 / 

/" 
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compared with single-factor experiment, 
339-342 

fitting of response curves to treatments, 
349-354 

fitting of response surface, 354-358 
Finite population correction, 513 
First-order reaction curve, 448. See also 
Asymptotic regression. 

Fixed effects model 
in factorial experiments, 364-369 
in one-way classification, 275 
Fourfold (2 x 2) table, 215 
, Freedom, degrees of. See Degrees of free- 
dom. 

Frequency 
class, 23 
cumulative, 26 
distribution, 16, 30 
continuous, 23 
discrete, 16 

number of classes needed, 80-8 1 
expected, 20 
observed, 20 

g x and g 2 tests for non-normality, 86-87 
Genetic ratios 
tests of, 228-231, 248-249 
Geometric mean, 330 

Goodness of fit test, * 2 , 84. See also Chi- 
square. 

Graphical representation, 16, 40 
Grouping 

loss of accuracy due to, 81 
Growth curve 
exponential, 449 
logistic, 448-449 

Harmonic mean, 475 
Heterogeneity 
chi-square, 248 
of variances, 296, 324 
“ Hierarchal classifications, 285-289 
Histogram, 25 
Homogeneity, test of 
in binomial proportions, 240 
in Poisson counts, 231 
in regression coefficients, 432 
of between- and within-class regressions, 
436 

Hotelling’s T 2 -test, 414, 417 
Hypotheses about populations, 20. See 
Tests of significance, 
null, 26, 30 
tests of 

Independence 
assumption of 


in analysis of variance, 323 
in binomial distribution, 201 
in probability, 201 
with attributes, 219 
Independent samples 
comparison of two means, 100-105, 1 14- 
116 

comparison of two proportions, 215-223 
Independent variable in regression, 135 
Inferences about population, 3-9, 29, 504- 
505. See also Confidence intervals. 
Interaction, 341 
possible reasons for, 346 
three-factor, 359-364 
in contingency tables, 496 
two-factor, 341-349, 473 
Interpolation in tables, 541 
Interval estinate, 5, 29. See also Confidence 
interval. 

Intraclass correlation, 294-296 
Inverse matrix, 389, 403, 409-412 
Kendall’s x, 194 
Kurtosis. 86 

effect on variance of s 2 , 89 
test for, 86-88 
' table, 552 

Latin square, 312 
efficiency, 316 

model and analysis of variance, 312-315 
rejection of observations, 321-323 
test of additivity, 334-337 
Least significant difference," 272 
Least squares, method of, 147 
as applied to regression, 147 
Gauss theorem, 147 

in two-way tables with unequal numbers, 
483-493 

Level of significance, 27 

Limits, confidence. See Confidence intervals. 

Likelihood, maximum, 495 

Linear calibration, 159-160 

Linear regression.* See Regression 

Listing, 509-511 

Logarithm 

common and natural, 451-452 
Logarithmic 
graph paper, 450, 452 
transformation, 329 330 
Logistic growth law, 448-449 
Logit transformation, 494, 497-503 
Lognormal distribution, 276 

Main effect, 340-342 
Main plot, 369 
Mann-Whitney test, 1 30 
significance levels, 131, 555-556 
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Mantel-Haenszel test, 255-256 
Mathematical model for 
analysis of covariance, 419 
exponential growth curve, 449 
factorial experiment, 357, 364-369 
Latin square, 313 
logistic growth curve, 448-449 
multiple regression, 382, 394 
nested (split-plot) designs, 370 
one-way classification 
fixed effects, 275 
mixed effects, 288 
random effects, 279, 289 
orthogonal polynomials, 460-465 
regression, 141 4 

asymptotic, 468 
non-linear, 465 

two-way classification, 302-308, 473 
Matrix, 390 

inverse, 390, 409, 439, 490 
Maximin method, 246 
“Maximum likelihood, 495 
Mean 

absolute deviation, 44 
adjusted, 421, 429 
arithmetic, 39 
correction for, 261-262 
distribution of, 51 
geometric, 330 
harmonic, 475 
weighted, 186, 438, 521 
Mean square, 44 
expected value 

in factorial experiments, 364-369 
with proportional sub-class numbers, 
481-482 

Mean square error 
in sampling finite populations, 506 
Measurement data, 29 
Median 123 

calculation from large sample, 123 
confidence interval, 124-125 
distribution of sample median, 124 
Mendelian inheritance 
Heterogeneity x 2 text, 248-249 
test of specified frequencies, 228-231 
Missing data 
in Latin square, 319-320 
in one-why classification, 317 
in two-way classification, 317-321 
Mitscherlich’s law, 447. See also Asymptotic 
regression. 

Mixed effects model 
in factorial experiments, 364-369 
in nested classifications, 288-289 
Mode, 124 


Model. See Mathematical model. 

Model I, fixed effects. See Fixed effects 
model. 

Model II, random effects. See Random 
effects model. 

Moment about mean, 86 
Monte Carlo method, 13 
Multinomial distnbution, 235 
Multiple comparisons, 271-275. 

Multiple covariance. See Analysis of co- 
variance. 

Multiple regression. See Regression. 
Multiplication rule of probability, 201 
Multivariate /-test, 414, 417 
Mutually exclusive outcomes, 200 

Nested 

classifications, 285-289, 291-294 
designs, 369 

Newman-Keuls test, 273-275 
Non-additivity 

effects of in analysis of variance, 330-331 
removal by transformation, 329, 331 
tests for 

in Latin square, 334-337 
in two-way classification, 331-334 
N on-parametric methods 
Mann -Whitney test, 130 
median and percentiles, 123-125 
rank correlation, 193-195 
sign test, 127 

Wilcoxon signed rank test, 128 
Normal distribution, 32 
formula for ordinate, 34 
mean, 32 

method of fitting to observed data, 70-72 
reasons for use of, 35 
relation to binomial, 32, 209-213 
standard deviation, 32 
table of cumulative distribution, 548 
table of ordinates, 547 
tests of normality, 86-88 
Normal equations, 383 
in multiple regression, 383, 389, 403 
in two-way classifications, 488-491 
Normality, test of, 84-88 
Null hypothesis, 26, 30 

One-tailed tests, 76-77, 98-99 
One-way classification, frequencies 
expectations equal, 231-235, 242-243 
expectations estimated, 236-237 
expectations known, 228-231 
expectations small, 235 
One-way classification, measurements 
analysis of variance, 238-248 
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comparisons among means, 268-275 
effects of errors in assumptions, 276-277 
model I, fixed effects, 275 
model II, random effects, 279-285, 289- 
291 

rejection of observations, 321-323 
samples of unequal sizes, 277-278 
Optimum allocation 
in stratified sampling, 523-526 
in three-stage sampling, 533 
in two-stage sampling, 531-533 
Ordered classifications 
methods of analysis, 243-246 
Order statistics, 1 23 
Orthogonal comparisons, 309 
in analysis of factorial experiments, 346- 
361 

Orthogonal polynomials, 349-351, 460-464 
tables of coefficients (values), 351, 572 
Outliers (suspiciously large deviations) 
in analysis of variance, 321 
in regression, 1 57 


Paired samples, 91 
comparison of means, 93-95, 97-99 
comparison of proportions, 213-215 
conditions suitable for pairing, 97 
self-pairing, 91 

versus independent samples, 106-108 
Parabolic regression, 453-456 
Parameter, 32 
Partial 

correlation, 400 
coefficient, 400 
regression coefficient, 382 
interpretation of, 393-397 
standard, 398 
Pascal’s triangle, 204 

Percentages, analysis of. See Proportions, 
analysis of. 

Percentiles, 125 

estimation by order statistics, 1 25 
Perennial expenments, 377-379 
Placebo, 425 

Planned comparisons, 268-270 
Point estimate, 5, 29 
Poisson distribution, 223-226 
fitting to data, 224-225 
formula for, 223 
test of goodness of fit, 236-237 
variance test of homogeneity, 232-236 
Polynomial regression or response curve, 
349-354 

Pooling (combining) 
correlation coefficients, 187 
estimated 1 differences in 2 x 2 tables, 254- 


256 

estimates of variance, 101-103 
of classes for x 2 tests, 235 
regression coefficients, 438 
Population, 4, 29, 504-505 
finite, 504-505,512-513 
sampled, 15, 30 
target, 30 

Power function, 280 
Primary sampling units, 528 
Probability 

simple rules, 199-202, 219 
Probability sampling, 508-509 
Proportional sub-class numbers, method of, 
478-483 

Proportions, analysis of 
in One-way classifications, 240-243 
test for a linear trend, 246-248 
in two-way classifications, 493 
in angular (arcsin) scale, 496 
in logit scale, 497-503 
in original {p ) scale, 495-497 
in sets of 2x2 tables, 253-256 

Random digits (numbers), 12-13, 30 
table, 543-546 
Random effects model 
in factorial experiments, 364-369 
in one-way classification, 279-294 
Randomization, 110 
as precaution against bias, 1 09- 1 1 1 
Randomization test (Fisher’s), 133 
Randomized blocks, 299. See also Two- 
way classifications, 
efficiency of blocking, 311 
Random sampling, 10-11, 30 
stratified, 1 1 
with replacement, 1 1 
without replacement, 1 1, 505 
Range, 39 

efficiency relative to standard deviation, 
46 

relation to standard deviation, 40 
Studentized Range test, 272-273 
Mest based on, 120 
tables, 553-554 

use in comparison of means, 275 
Rank correlation, 193-195 
Ranks, 128 

efficiency relative to normal tests, 132 
rank sum test, 130-132 
signed rank test, 128-130 
Ratio 

estimates in sample surveys, 536-537 
estimation of, 1 70 
standard error of, 241, 515, 537 
Rectangular (uniform) distribution 81 
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Rectification, 449 
Regression, 135 

analysis of variance for, 160-163 
coefficient (slope), 136 
interval estimate of, 153 
value in some simple cases, 147-148 
comparison of “between classes” and 
“within classes” regressions, 436-438 
comparison of regression lines, 432-436 
confidence interval for slope, 1 53 
deviations from, 138 
effects of errors in A', 164-166 
estimated regression line, 144-145 
estimated residual variance, 145-146 
estimates in sample surveys, 537-538 
equation, 136 

historical origin of the term, 164 
m one-way classification of frequencies, 
234 

line through origin, 166-169 
linear regression of proportions, 246-248 
mathematical model, 141-144 
multiple, 381 

computations in fitting, 383-393, 403- 
412 

deletion of an independent variable, 412 
deviations mean square, 385-389 
effects of omitted variables, 394-397 
importance of different A-variables, 
398-400 

interpretation of coefficients, 393-397 
partial regression coefficient, 382 
prediction of individual observation, 
392 

prediction of population line, 392 
purposes, 381 

selection of variates for prediction, 412- 
414 

standard error of a deviation. 392 
standard errors of regression coeffi- 
cients, 391 

testing a deviation, 392-393 
tests of regression coefficients, 386-388 
non-hnear m some parameters, 465-471 
general method of fitting, 465-467 
parabolic, 453-456 

prediction of individual observation, 1 55- 
157 

prediction of the population line, 153 
prediction of A from Y , 159-160 
relation to correlation, 175-177, 188-190 
shortcut computation, 1 39 
situation when X varies from sample to 
sample, 149-150 
testing a deviation, 1 57-158 
tests for linearity, 453-459 
Rejection of observations 


in analysis of variance, 321-323 
Relative amount of information, 311 
Relative efficiency, 46 
of range, 46 

Relative rate of increase, 450 
Replications, 299 
Residuals, 300-301, 305-307 
Response curve 
polynomial, 349-351 
Response surface, 346 
example of fitting, 354-358 
Ridits, 246 
Rounding errors, 8 1 
effect on accuracy of X and s, 81 

Sample, 4, 29 
cluster, 511, 513-515 
non-random, 509 
probability, 508-509 
random, 10-11,30, 505,511 
stratified random, 507, 520-527 
systematic, 5_19 
Sample mean, A, 39 

calculation from a frequency distribution, 
80-83 

frequency distribution of, 5 1 
Sample standard deviation s , 44 
Sampling fraction, 512 
unequal, 507 
Sampling unit, 509 
Scales with limited values, 1 32 
Scheffe’s test, 271 
Scores 

assigned to ordered classifications, 24^- 
246 

Selection of candidates, 189 
Selection of variates for prediction, 412-414 
Self-pairing, 91, 97 
Self-weighting estimate, 521 
Semi-logarithmic graph paper, 450 
Series of experiments, 375-377 
Sets of 2 x 2 tables, 253-256 
Sheppard’s correction, 83 
Sign test, 125-127 
efficiency of, 127 
table of significance levels, 554 
Signed rank test, 128 
significance levels, 129, 555 
Significance 
level, 27 

tests of ( See Tests of significance.) 

Simple random sampling, 505-507 
of cluster units, 513-515 
properties of estimates, 511-515 
size of sample, 5 1 6-5 1 8 / 

Size of sample 

for comparing two proportions, 221-222 
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for estimating population mean, 58 
for tests of significance when companng 
means 111-114 

in sampling finite populations, 516-518 
m two-stage (nested) sampling, 281 
within strata, 523-526 
Skewness, 72 
test of, 86 
table, 252 
Smoothing, 447 

Spearman’s rank correlation coefficient, 194 
Split-plot (nested) design, 369 
analysis of vanance, 370-373 
comparison with randomized blocks, 373 
reasons for use, 369-370 
Square roots 
method of finding, 541 
table, 573-575 - % 

Square root transformation, 325-327 ^ ^ 
Standard deviation w 

of estimates from data 
adjusted difference, 423 
difference, 100, 104, 106, 115, 190 
g x for skewness, 86 
g 2 for kurtosis, 87 
mean of random sample, 50, 512 
median, 124 

population total, 51, 513 
regression coefficient, 138, 391 
sample total, 51 
sum, 190 

transformed correlation, 1 85 
vanance, 89 
of population 
binomial, 207-208 
normal, 32 
Poisson, 225 

Standard error, 50 See also Standard devia- 
tion 

Standard normal deviate, 36 
Standard normal variate 36 
Standard partial regression coefficient, 398 
Step up and step down methods, 413 
Stratified random sampling, 11, 507, 520 
for attributes 526-527 
optimum allocation, 523-526 
proportional allocation, 521-523 
reasons for use, 520 
Stream extension, 1 89 
Structural regression coefficient, 165 
Studentized Range test 272-273 
shortcut computation using ranges, 275 
table 568 

Student’s /-distribution 59 
table 549 
Sub-class numbers 
equal 47 s ° 


equal within rows 477 
proportional, 478 
unequal, 472 
Sub-plots, 369 

Sub-samp lmg See Two-stage sampling 
Sum 

of products, 136 
correction for means, 141 
of squares, 44-45 
correction for mean, 48 49 
Systematic sampling, 519 

t (Student’s /-distribution) 59 
table, 549 

Tests of significance, 26-30 
goodness of fit test, * 2 , 84-85 
in analysis of covariance, 423 425 
in Rx C contingency tables 2^0 252 
m2 x C contingency tables 238 243 
246-249 

binomial proportion, 26-28, 211 213 
all differences among means 271-275 
correlation coefficient, 1 84- 1 88 
difference between means ol independem 
samples, 100-105 114-116 
difference between means of paired sam 
pies, 93-95, 97-99 

difference between two binomial proper-, 
tions, 213-221 

equality of two correlated variances 195 
197 

equality of two variances, 1 16 
goodness of fit ot distributions, 236-23"' 
homogeneity of Poisson samples, 232-236 
homogeneity of variances 296-298 
linear trend in proportions, 246-248 
linearity of regression, 453-460 
multiple correlation coefficient 402 
rank correlation coefficient 194 
single classification with estimated Ire 
quencies, 236-238 

single classification with equal bequen 
cies 231-234 

single classification with specified tre 
quencies, 228-231 
/-test based on range 1 20 
test of skewness, 86 
tests of kurtosis, 86-88 
Three-stage sampling, 285-288, 533 
allocation of sample sizes 533 
Transformation, 277 
logarithmic, 329-330 
logit 494 497-503 
to remove non-addmut\ 331 332 
to stabilize variance 325 
angular (aresm) 327-329 
square sot 325 32^ 
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use m fitting non-linear relations 448-453 
1 reatments, definition, 91 
Treatment combination, 340 
Tukey’s tests for additivity 331-337 
Two-stage sampling 528 
reasons for use 528 

with primary units of equal size 529-533 
choice of sample and sub-sample st/es, 
531-533 

with primary units of unequal sizes 534 
536 

Two-way classifications, frequencies, 238— 
243 

R x C tables 250- 253 
sets of 2 x 2 tables 253 257 
2 a C tables 238 243 246 250 
2x2 (fourfold) tables 215 223 
Two-way classifications measurements 
additivity assumption, 302, 330-334 
analysis of variance 299-301 
mathematical model 302 307 
rejection of observations 321 323 
test of additivity 331 334 
with unequal numbers 472 
complications involved 472 475 
equal weights withm loss 477 478 
least squares analysis R x C table 488- 
493 

method of proportional numbers, 478- 
483 

R x 2 table, 484487 
2x2 table, 483 484 
unweighted analysis, 475-477 
Two-way classifications proportions 
analysis in logit scale 497 503 
analysis in proportions scale 495-497 
approaches to analysis, 493495 


Unbiased estimate, 45 
Uniform distribution 51 
in relation to rounding errors, 81 
Unweighted means, method of, 475-477 

Variance, 53 

analysis (See Analysis of variance ) 
comparison of two correlated variances, 
195-197 

companson of two variances, I i 6 
components (See Components of vari- 
ance ) 

confidence interval for, 74 
of difference, 100, 104, 106, 115, 190 
of sum, 190 
ratio F, 265 

distribution under general hypothesis 
280 

table 560-567 

test of homogeneity 296-298 
Variation coefficient of 62 

Weighted mean 
in stratified sampling 521 
of differences in proportions 255 
ot ratios, 170 

of regression coefficients using estimated 
weights 438 

of transformed correlations, 1 87 
Welch-Aspm test, 115 
Wilcoxon signed rank test, 128 

Z or r, standard normal vanate 51 
r-transformation ot a correlation -o 
efficient, 185 
tables, 558-559 







